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Computer  security  policies  often  are  stated  informally  in  terms  of  confidential¬ 
ity,  integrity,  and  availability  of  information  and  resources;  these  policies  can  be 
qualitative  or  quantitative.  To  formally  quantify  confidentiality  and  integrity,  a 
new  model  of  quantitative  information  flow  is  proposed  in  which  information 
flow  is  quantified  as  the  change  in  the  accuracy  of  an  observer's  beliefs.  This 
new  model  resolves  anomalies  present  in  previous  quantitative  information- 
flow  models,  which  are  based  on  change  in  uncertainty.  And  the  new  model  is 
sufficiently  general  that  it  can  be  instantiated  to  measure  either  accuracy  or  un¬ 
certainty.  To  formalize  security  policies  in  general,  a  generalization  of  the  theory 
of  trace  properties  (originally  developed  for  program  verification)  is  proposed. 
Security  policies  are  modeled  as  hyperproperties,  which  are  sets  of  trace  prop¬ 
erties.  Although  important  security  policies,  such  as  secure  information  flow, 
cannot  be  expressed  as  trace  properties,  they  can  be  expressed  as  hyperproper¬ 
ties.  Safety  and  liveness  are  generalized  from  trace  properties  to  hyperproper¬ 
ties,  and  every  hyperproperty  is  shown  to  be  the  intersection  of  a  safety  hyper¬ 
property  and  a  liveness  hyperproperty.  Verification,  refinement,  and  topology 
of  hyperproperties  are  also  addressed.  Flyperproperties  for  system  representa¬ 
tions  beyond  trace  sets  are  investigated. 
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CHAPTER  1 


INTRODUCTION 


Computer  security  policies  express  what  computer  systems  may  and  may  not 
do.  For  example,  a  security  policy  might  stipulate  that  a  system  may  not  allow 
a  user  to  read  information  that  belongs  to  other  users,  or  that  a  system  may 
process  transactions  only  if  they  are  recorded  in  an  audit  log,  or  that  a  system 
may  not  delay  too  long  in  making  a  resource  accessible  to  a  user.1 

This  dissertation  addresses  mathematical  foundations  for  security  policies, 
in  two  ways.  First,  metrics  are  developed  for  quantifying  how  much  secret 
information  a  computer  system  can  leak,  and  for  quantifying  the  amount  of 
trusted  information  within  a  computer  system  that  becomes  contaminated.  Sec¬ 
ond,  a  taxonomy  is  proposed  for  formal,  mathematical  expression  and  classifica¬ 
tion  of  security  policies.  These  contributions  are  best  understood  in  the  context 
of  a  select  history  of  computer  security  policies. 

1.1  Historical  Background 


Security  policies  have  long  been  formulated  in  terms  of  a  tripartite  taxonomy: 
confidentiality,  integrity,  and  availability.  Henceforth,  this  is  called  the  CIA  tax¬ 
onomy.  There  is  no  agreement  on  how  to  define  each  element  of  this  taxonomy — 
as  evidenced  by  table  1.1,  which  summarizes  the  evolution  of  the  CIA  taxonomy 
in  academic  literature,  standards,  and  textbooks.2  Perhaps  the  most  widely  ac- 

1  Security  policies  might  also  express  what  human  users  of  computer  systems  may  or  may 
not  do — for  example,  that  users  may  not  remove  machines  from  a  building.  This  dissertation 
focuses  on  computers,  not  humans;  Sterne  [111]  discusses  the  relationship  between  these  two 
kinds  of  policies. 

2Nor  is  there  agreement  on  what  abstract  noun  to  associate  with  the  elements  of  this  tax¬ 
onomy.  Various  authors  use  the  terms  "aspects"  [16,47],  "categories  of  protection"  [31],  "char¬ 
acteristics"  [97],  "goals"  [26, 97],  "needs"  [72],  "properties"  [58],  "qualities"  [97],  and  "require¬ 
ments"  [92]. 
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Table  1.1:  Definitions  of  the  CIA  taxonomy.  Confidentiality,  integrity,  and  avail 
ability  are  abbreviated  C.,  I.,  and  A. 


Source 

Year 

Term 

Definition 

Voydock  and  Kent  [121] 

1983 

N/A 

Security  violations  can  be  divided  into. . .  unauthorized  re¬ 
lease  of  information,  unauthorized  modification  of  informa¬ 
tion,  or  unauthorized  denial  of  resource  use. 

Clark  and  Wilson  [26] 

1987 

N/A 

System  should  prevent  unauthorized  disclosure  or  theft  of 
information,  . . .  unauthorized  modification  of  information, 

and. . .  denial  of  service. 

ISO  7498-2  [58] 

1989 

C. 

Information  is  not  made  available  or  disclosed  to  unautho¬ 
rized  individuals,  entities,  or  processes. 

I. 

Data  has  not  been  altered  or  destroyed  in  an  unauthorized 

manner. 

A. 

Being  accessible  and  useable  upon  demand  by  an  authorized 
entity. 

ITSEC  [30] 

1991 

C. 

Prevention  of  unauthorized  disclosure  of  information. 

I. 

Prevention  of  unauthorized  modification  of  information. 

A. 

Prevention  of  unauthorized  withholding  of  information  or  re¬ 

sources. 

NRC  [92] 

1991 

C. 

Controlling  who  gets  to  read  information. 

I. 

Assuring  that  information  and  programs  are  changed  only  in 
a  specified  and  authorized  manner. 

A. 

Assuring  that  authorized  users  have  continued  access  to  in¬ 
formation  and  resources. 

Pfleeger  [97] 

1997 

C. 

The  assets  of  a  computing  system  are  accessible  only  by  au¬ 
thorized  parties.  The  type  of  access  is  read-type  access. 

I. 

Assets  can  be  modified  only  by  authorized  parties  or  only  in 
authorized  ways. 

A. 

Assets  are  accessible  to  authorized  parties. 

Gollmann  [47] 

1999 

C.,  I.,  A. 

Same  as  ITSEC. 

Lampson  [72] 

2000 

Secrecy 

Controlling  who  gets  to  read  information. 

I. 

Controlling  how  information  changes  or  resources  are  used. 

A. 

Providing  prompt  access  to  information  and  resources. 

Bishop  [16] 

2003 

C. 

Concealment  of  information  or  resources. 

I. 

Trustworthiness  of  data  or  resources. . .  usually  phrased  in 
terms  of  preventing  improper  or  unauthorized  change. 

A. 

The  ability  to  use  the  information  or  resource  desired. 

Common  Criteria  [31] 

2006 

C. 

Protection  of  assets  from  unauthorized  disclosure. 

I. 

Protection  of  assets  from  unauthorized  modification. 

A. 

Protection  of  assets  from  loss  of  use. 
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cepted,  current  definitions  (if  only  because  of  adoption  by  North  American  and 
European  governments)  are  those  given  by  the  Common  Criteria  [31,  §1.4],  an 
international  standard  for  evaluation  of  computer  system  security: 

•  Confidentiality  is  the  protection  of  assets  from  unauthorized  disclosure. 

•  Integrity  is  the  protection  of  assets  from  unauthorized  modification. 

•  Availability  is  the  protection  of  assets  from  loss  of  use. 

The  term  "assets"  is  essentially  undefined  by  the  Common  Criteria.  From  the 
other  definitions  in  table  1.1,  we  surmise  that  assets  include  information  and 
system  resources. 

These  definitions  of  the  CIA  taxonomy  raise  the  question  of  how  to  distin¬ 
guish  between  unauthorized  and  authorized  actions.  Authorization  policies  have 
been  developed  to  answer  this  question.  In  the  vocabulary  of  authorization 
policies,  a  subject  generalizes  the  notion  of  a  user  to  include  programs  running 
on  behalf  of  users.  Likewise,  object  generalizes  "information"  and  "resource," 
and  right  is  used  instead  of  "action."  Every  subject  can  also  be  treated  as  an 
object,  so  that  subjects  can  have  rights  to  other  subjects.  Authorization  policies 
can  be  categorized  as  follows: 

•  Access-control  policies  regulate  actions  directly  by  specifying  for  each  sub¬ 
ject  and  object  exactly  what  rights  the  subject  has  to  the  object.  File-system 
permissions  (e.g.,  in  Unix  or  Microsoft  Windows)  embody  a  familiar  ex¬ 
ample  of  an  access-control  policy,  in  which  users  may  (or  may  not)  read, 
write,  and  execute  files.  Access-control  policies  originated  in  the  devel¬ 
opment  of  multiprogrammed  systems  for  the  purpose  of  preventing  one 
user's  program  from  harming  another  user's  program  or  data  [74]. 3 

3Lampson  [74]  gives  the  canonical  formalization  of  access-control  policies  as  matrices  in 
which  rows  represent  subjects,  columns  represent  objects,  and  entries  are  rights. 
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•  Information-flow  policies  regulate  actions  indirectly  by  specifying,  for  each 
subject  and  object,  whether  information  is  allowed  to  flow  between  them. 
This  specification  is  used  to  determine  what  actions  are  allowed.  Mul¬ 
tilevel  security,  formalized  by  Bell  and  LaPadula  [13]  and  by  Feiertag  et 
al.  [43],  is  a  familiar  example  of  an  information-flow  policy  that  is  used 
to  govern  confidentiality:  Each  subject  is  associated  with  a  security  level 
comprising  a  hierarchical  clearance  (e.g..  Top  Secret,  Secret,  or  Unclassified) 
and  a  non-hierarchical  category  set  (e.g.,  {Atomic,  NATO}).  Information  is 
permitted  to  flow  from  a  subject  S}  to  subject  S2  only  if  the  clearance  of 
S\  is  less  than  or  equal  to  the  clearance  of  S2  and  the  category  set  of  S\ 
is  a  subset  of  the  category  set  of  S'2.4  Noninterference,  defined  by  Goguen 
and  Meseguer  [46],  is  another,  important  example  of  an  information-flow 
policy.  It  stipulates  commands  executed  on  behalf  of  users  holding  high 
clearances  have  no  effect  on  system  behavior  observed  by  users  holding 
low  clearances.  This  policy,  or  a  variant  of  it,  is  enforced  by  many  pro¬ 
gramming  language-based  mechanisms  [104]. 

When  used  to  govern  confidentiality  of  information,  access-control  poli¬ 
cies  regulate  the  release  of  information  in  a  system,  whereas  information- 
flow  policies  regulate  both  the  release  and  propagation  of  information.  Thus 
information-flow  policies  are  stronger  than  access-control  policies.  For  example, 
an  information-flow  policy  might  require  that  the  information  in  file  f  .  txt  does 
not  become  known  to  any  user  other  than  alice.  A  Unix  access-control  policy 
on  file  f  .  txt  might  approximate  the  information-flow  policy  by  stipulating  that 

4The  first  mathematical  formalization  of  security-level  comparison  seems  to  be  a  result  of 
Weissman  [124];  a  more  general  formalization  in  terms  of  lattices  was  given  by  Denning  [36]. 
Differences  between  the  Bell-LaPadula  and  Feiertag  et  al.  models  of  multilevel  security  are  dis¬ 
cussed  by  Taylor  [114].  Multilevel  security,  in  addition  to  being  an  information-flow  policy,  is 
an  example  of  a  mandatory  access  control  (MAC)  policy.  In  contrast  are  discretionary  access  control 
(DAC)  policies — for  example,  Unix  file-system  permissions. 
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only  alice  can  execute  a  read  operation  on  f .  txt.  But  a  Trojan  horse5  running 
with  the  permissions  of  alice  would  be  allowed,  according  to  the  access-control 
policy,  to  copy  f  .  txt  to  some  public  file  from  which  anyone  may  read.  The  con¬ 
tents  of  f  .  txt  would  no  longer  be  secret,  violating  the  information-flow  policy 

Malicious  programs  such  as  a  Trojan  horse  might  exploit  channels,  or  com¬ 
munication  paths,  other  than  the  file  system  to  violate  information-flow  policies. 
Lampson  introduces  the  notion  of  a  covert  channel,  which  is  a  channel  "not  in¬ 
tended  for  information  transfer  at  all"  [73] — for  example,  filesystem  locks,  sys¬ 
tem  load,  power  consumption,  or  execution  time.6  The  Department  of  Defense 
later  defined  a  covert  channel  somewhat  differently  in  its  Trusted  Computer 
System  Evaluation  Criteria — also  known  as  the  "Orange  Book"  because  of  its 
cover — as  "any  communication  channel  that  can  be  exploited  by  a  process  to 
transfer  information  in  a  manner  that  violates  the  system's  security  policy"  [37], 
The  TCSEC  categorizes  covert  channels  into  storage  and  timing  channels.  Stor¬ 
age  channels  involve  reading  and  writing  of  storage  locations,  whereas  timing 
channels  involve  using  system  resources  to  affect  response  time  [37] . 7 

Rather  than  forbid  the  existence  of  covert  channels,  the  TCSEC  specifies 
that  systems  should  not  contain  covert  channels  of  high  bandwidth.8  Low- 
bandwidth  covert  channels  are  allowed  only  because  eliminating  them  is  usu¬ 
ally  infeasible.  And  sometimes  elimination  is  impossible:  the  proper  function  of 
some  systems  requires  that  some  information  be  leaked.  One  example  of  such 

5A  Trojan  horse  [7]  is  a  program  that  offers  seemingly  beneficial  functionality,  so  that  users 
will  run  the  program — even  if  the  program  is  given  to  them  as  a  gift  and  they  do  not  know  its 
provenance  or  contents.  But  the  program  also  contains  malicious  functionality  of  which  users 
are  unaware. 

6Lampson  also  introduces  "storage"  and  "legitimate"  channels.  The  distinctions  between 
these  and  covert  channels — as  Millen  [89]  observes — are  somewhat  elusive. 

7Kemmerer  [62]  seems  to  be  the  source  of  TCSEC's  categorization. 

8The  TCSEC  defines  "high"  as  100  bits  per  second,  the  rate  at  which  teletype  terminals  ran 
circa  1985.  The  "Light  Pink  Book"  [91]  offers  a  more  nuanced  analysis  of  what  constitutes  high 
bandwidth. 
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a  system  is  a  password  checker,  which  allows  or  denies  access  to  a  system  based 
on  passwords  supplied  by  users.  By  design,  a  password  checker  must  release 
information  about  whether  the  passwords  entered  by  users  are  correct. 

Research  into  quantifying  the  bandwidth  of  covert  channels  began  by  em¬ 
ploying  information  theory,  the  science  of  data  transmission.  Information  the¬ 
ory  could  already  quantify  communication  channel  bandwidth,  so  its  use  with 
covert  channels  was  natural.  Denning's  seminal  work  [35]  in  this  area  uses  en¬ 
tropy,  an  information-theoretic  metric  for  uncertainty,  to  calculate  how  much 
secret  information  can  be  leaked  by  a  program.  Millen  [88]  proposes  mutual  in¬ 
formation,  which  is  defined  in  terms  of  entropy,  as  a  metric  for  information  flow. 
These  metrics  make  it  possible  to  quantify  information  flow. 

Much  more  history  of  computer  security  policies  could  be  surveyed,  but 
what  we  have  covered  suffices  to  put  this  dissertation  in  context.  The  begin¬ 
ning  (a  taxonomy  of  security  policies)  and  the  end  (quantification  of  informa¬ 
tion  flow)  of  our  background  are  the  places  where  this  dissertation  makes  its 
contributions. 

1.2  Contributions  of  this  Dissertation 

Quantification  of  security.  Quantification  of  information  flow  is  more  diffi¬ 
cult  than  at  first  it  might  seem.  Consider  a  password  checker  PWC  that  sets 
an  authentication  flag  a  after  checking  a  stored  password  p  against  a  (guessed) 
password  g  supplied  by  the  user. 

PWC  :  if  p  =  g  then  a  :=  1  else  a  :=  0 

For  simplicity,  suppose  that  the  password  is  either  A,  B,  or  C.  Suppose  also  that 
the  user  is  actually  an  attacker  attempting  to  discover  the  password,  and  he  be- 
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lieves  the  password  is  overwhelmingly  likely  to  be  A  but  has  a  minuscule  and 
equally  likely  chance  to  be  either  B  or  C.  (This  need  not  be  an  arbitrary  assump¬ 
tion  on  the  attacker's  part;  perhaps  the  attacker  was  told  by  a  usually  reliable 
informant.)  If  the  attacker  experiments  by  executing  PWC  and  guessing  A,  he 
expects  to  observe  that  a  equals  1  upon  termination.  Such  a  confirmation  of  the 
attacker's  belief  would  seem  to  convey  some  small  amount  of  information.  But 
suppose  the  informant  was  wrong:  the  real  password  is  C.  Then  the  attacker 
observes  that  a  is  equal  to  0  and  infers  that  A  is  not  the  password.  Common 
sense  dictates  that  his  new  belief  is  that  B  and  C  each  have  a  50%  chance  of 
being  the  password.  The  attacker's  belief  has  greatly  changed — he  is  surprised 
to  discover  the  password  is  not  A — so  the  outcome  of  this  experiment  conveys 
more  information  than  the  previous  outcome.  Thus,  the  information  conveyed 
by  executing  PWC  depends  on  what  the  attacker  initially  believed. 

How  much  information  flows  from  p  to  a  in  each  of  the  above  experiments? 
Answers  to  this  question  have  traditionally  been  based  on  change  in  uncer¬ 
tainty,  typically  quantified  by  entropy  or  mutual  information:  information  flow 
is  quantified  by  the  reduction  in  uncertainty  about  secret  data  [19,24,35,49,76, 
82,88].  Observe  that,  in  the  case  where  the  password  is  C,  the  attacker  initially 
is  quite  certain  (though  wrong)  about  the  value  of  the  password  and  after  the 
experiment  is  rather  uncertain  about  the  value  of  the  password;  the  change  from 
"quite  certain"  to  "rather  uncertain"  is  an  increase  in  uncertainty.  So  according 
to  a  metric  based  on  reduction  in  uncertainty,  no  information  flow  occurred, 
which  is  anomalous  and  contradicts  our  intuition. 

The  problem  with  metrics  based  on  uncertainty  is  twofold.  First,  they  do 
not  take  accuracy  into  account.  Accuracy  and  uncertainty  are  orthogonal  prop¬ 
erties  of  the  attacker's  belief — being  certain  does  not  make  one  correct — and  as 
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the  password  checking  example  illustrates,  the  amount  of  information  flow  de¬ 
pends  on  accuracy  rather  than  on  uncertainty.  Second,  uncertainty-based  met¬ 
rics  are  concerned  with  some  unspecified  agent's  uncertainty  rather  than  an 
attacker's.  The  unspecified  agent  is  able  to  observe  a  probability  distribution 
over  secret  input  values  but  cannot  observe  the  particular  secret  input  used  in 
the  program  execution.  If  the  attacker  were  the  unspecified  agent,  there  would 
be  no  reason  in  general  to  assume  that  the  probability  distribution  the  attacker 
uses  is  correct.  Because  the  attacker's  probability  distribution  is  therefore  sub¬ 
jective,  it  must  be  treated  as  a  belief.  Beliefs  are  thus  an  essential — though  until 
now  uninvestigated — component  of  information  flow. 

Chapter  2  presents  a  new  way  to  quantify  information  flow,  based  on  these 
insights  about  beliefs  and  accuracy.  We9  give  a  formal  model  for  experiments, 
which  describe  the  interaction  between  attackers  and  systems  by  specifying 
how  attackers  update  beliefs  after  observing  system  execution.  This  experi¬ 
ment  model  can  be  used  with  any  mathematical  representation  of  beliefs  that 
supports  three  natural  operations  (product,  update,  and  distance);  as  a  concrete 
representation,  we  use  probability  distributions.  Accordingly,  we  model  sys¬ 
tems  as  probabilistic  imperative  programs.  We  show  that  the  result  of  belief  up¬ 
date  in  the  experiment  model  is  equivalent  to  the  attacker  employing  Bayesian 
inference,  a  standard  technique  in  applied  statistics  for  making  inferences. 

Our  formula  for  calculating  information  flow  is  based  on  attacker  beliefs  be¬ 
fore  and  after  observing  execution  of  a  program.  The  formula  is  parameterized 
on  the  belief  distance  function;  we  make  the  formula  concrete  by  instantiating  it 
with  relative  entropy,  which  is  an  information-theoretic  measure  of  the  distance 
between  two  distributions.  The  resulting  metric  for  the  amount  of  leakage  of  se- 
9Joint  work  with  Andrew  C.  Myers  and  Fred  B.  Schneider. 
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cret  information  eliminates  the  anomaly  described  above,  enabling  quantifica¬ 
tion  of  information  flow  for  individual  executions  of  programs  when  attackers 
have  subjective  beliefs.  We  show  that  the  metric  correctly  quantifies  "informa¬ 
tion"  as  defined  by  information  theory.10  Moreover,  we  show  that  the  metric 
generalizes  previously  defined  uncertainty-based  metrics. 

Our  metric  also  enables  two  kinds  of  analysis  that  were  not  previously  pos¬ 
sible.  First,  it  is  able  to  analyze  misinformation,  which  is  a  negative  information 
flow.  We  show  that  deterministic  programs  are  incapable  of  producing  misin¬ 
formation.  Second,  our  metric  is  able  to  analyze  repeated  interactions  between 
an  attacker  and  a  system.  This  ability  enables  compositional  reasoning  about 
attacks — for  example,  about  attackers  who  make  a  series  of  guesses  in  trying  to 
determine  a  password. 

We  extend  our  experiment  model  to  handle  insiders,  whose  goal  is  to  help 
the  attacker  learn  secret  information.  Insiders  are  capable  of  influencing  pro¬ 
gram  execution,  and  we  model  them  by  introducing  nondeterministic  choice 
into  programs.  We  show  that  if  a  program  satisfies  observational  determin¬ 
ism  [85, 102, 130],  a  noninterference  policy  for  nondeterministic  programs,  then 
the  quantity  of  information  flow  is  always  zero. 

Previous  work  on  quantitative  information  flow  has  considered  only  confi¬ 
dentiality,  despite  the  fact  that  information  theory  itself  is  used  to  reason  about 
integrity.  Chapter  3  addresses  this  gap  by  applying  the  results  of  chapter  2  to 
integrity.11  This  application  enables  quantification  of  the  amount  of  untrusted 
information  with  which  an  attacker  can  taint  trusted  information;  we  name  this 

^Information  quantifies  how  surprising  the  occurrence  of  an  event  is.  The  information  (or 
self-information)  conveyed  by  an  event  is  the  negative  logarithm  of  the  probability  of  the  event. 
An  event  that  is  certain  (probability  1)  thus  conveys  zero  information,  and  as  the  probability 
decreases,  the  amount  of  information  conveyed  increases. 

nConcurrent  with  the  work  described  in  this  dissertation,  Newsome  et  al.  [94]  also  began  to 
investigate  quantitative  information-flow  integrity. 
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quantity  contamination.  Contamination  is  the  information-flow  dual  of  leakage, 
and  it  enjoys  a  similar  interpretation  based  on  information  theory. 

Moreover,  our12  investigation  of  information-flow  integrity  reveals  another 
connection  with  information  theory  Recall  that  information  theory  can  be  used 
to  quantify  the  bandwidth,  or  channel  capacity,  of  communication  channels.  We 
model  such  channels  with  programs  that  take  trusted  inputs  from  a  sender  and 
give  trusted  outputs  to  a  receiver.  The  transmission  of  information  to  the  receiver 
might  be  decreased  because  a  program  introduces  random  noise  into  its  output 
that  obscures  the  inputs,  or  because  a  program  uses  untrusted  inputs  (supplied 
by  an  attacker)  in  a  way  that  obscures  the  trusted  inputs.  In  either  case,  in¬ 
formation  is  suppressed.  We  show  how  to  quantify  suppression;  in  expectation, 
this  quantity  is  the  same  as  the  channel  capacity.  We  analyze  error-correcting 
codes  [4]  with  suppression. 

Simultaneously  quantifying  both  confidentiality  and  integrity  is  also  fruitful, 
because  programs  sometimes  sacrifice  integrity  of  information  to  improve  confi¬ 
dentiality.  For  example,  a  statistical  database  that  stores  information  about  indi¬ 
viduals  might  add  randomly  generated  noise  to  a  query  response  in  an  attempt 
to  protect  the  privacy  of  those  individuals.  The  addition  of  noise  suppresses 
information  yet  reduces  leakage,  and  our  quantitative  frameworks  make  this 
relationship  precise:  the  amount  of  suppression  plus  the  amount  of  leakage  is  a 
constant,  for  a  given  interaction  between  the  database  and  a  querier. 

Formalization  of  security.  The  CIA  taxonomy  is  an  intuitive  categorization  of 
security  requirements.  Unfortunately,  it  is  not  supported  by  formal,  mathemat¬ 
ical  theory:  There  is  no  formalization  that  simultaneously  characterizes  con¬ 
joint  work  with  Fred  B.  Schneider. 
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fidentiality,  integrity,  and  availability.13  Nor  are  confidentiality,  integrity,  and 
availability  orthogonal — for  example,  the  requirement  that  a  principal  be  un¬ 
able  to  read  a  value  could  be  interpreted  as  confidentiality  or  unavailability  of 
that  value.  And  the  CIA  taxonomy  provides  little  insight  into  how  to  enforce 
security  requirements,  because  there  is  no  verification  methodology  associated 
with  any  of  the  taxonomy's  three  categories. 

This  situation  is  similar  to  that  of  program  verification  circa  the  1970s.  Many 
specific  properties  of  interest  had  been  identified — for  example,  partial  correct¬ 
ness,  termination,  and  total  correctness,  mutual  exclusion,  deadlock  freedom, 
starvation  freedom,  etc.  But  these  properties  were  not  all  expressible  in  some 
unifying  formalism,  they  are  not  orthogonal,  and  there  was  no  verification 
methodology  that  was  complete  for  all  properties. 

These  problems  were  addressed  by  the  development  of  the  theory  of  trace 
properties.  A  trace  is  a  sequence  of  execution  states,  and  a  property  either  holds 
or  does  not  hold  (i.e.,  is  a  Boolean  function)  of  an  object.  Thus  a  trace  prop¬ 
erty  either  holds  or  does  not  hold  of  an  execution  sequence.  (The  extension  of 
a  property  is  the  set  of  objects  for  which  the  property  holds.  The  extension  of 
a  property  of  individual  traces — that  is,  a  set  of  traces — sometimes  is  termed 
"property,"  too  [5,  70].  But  for  clarity,  "trace  property"  here  denotes  a  set  of 
traces.)  Every  trace  property  is  the  intersection  of  a  safety  property  and  a  live¬ 
ness  property: 

•  A  safety  property  is  a  trace  property  that  proscribes  "bad  things"  and  can 
be  proved  using  an  invariance  argument,  and 

13A  formalism  that  comes  close  is  that  of  Zheng  and  Myers  [131],  who  define  a  particular 
noninterference  policy  for  confidentiality,  integrity,  and  availability. 
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•  a  liveness  property  is  a  trace  property  that  prescribes  "good  things"  and  can 
be  proved  using  a  well-foundedness  argument.14 

This  categorization  forms  an  intuitively  appealing  and  orthogonal  basis  from 
which  all  trace  properties  can  be  constructed.  Moreover,  safety  and  liveness 
properties  are  affiliated  with  specific,  relatively  complete  verification  methods. 
It  is  therefore  natural  to  ask  whether  the  theory  of  properties  could  be  used  to 
formalize  security  policies. 

Unfortunately,  important  security  policies  cannot  be  expressed  as  properties 
of  individual  execution  traces  of  a  system  [2,44, 86, 103, 115, 117, 129].  For  ex¬ 
ample,  noninterference  is  not  a  property  of  individual  traces,  because  whether 
a  trace  is  allowed  by  the  policy  depends  on  whether  another  trace  (obtained 
by  deleting  command  executions  by  high  users)  is  also  allowed.  For  another 
example,  stipulating  a  bound  on  mean  response  time  over  all  executions  is  an 
availability  policy  that  cannot  be  specified  as  a  property  of  individual  traces,  be¬ 
cause  the  acceptability  of  delays  in  a  trace  depends  on  the  magnitude  of  delays 
in  all  other  traces.  However,  both  example  policies  are  properties  of  systems, 
because  a  system  (viewed  as  a  whole,  not  as  individual  executions)  either  does 
or  does  not  satisfy  each  policy. 

The  fact  that  security  policies,  like  trace  properties,  proscribe  and  prescribe 
behaviors  of  systems  suggested  that  a  theory  of  security  policies  analogous  to 
the  theory  of  trace  properties  might  exist.  This  dissertation  develops  that  the¬ 
ory  by  formalizing  security  policies  as  properties  of  systems,  or  system  properties. 
If  systems  are  modeled  as  sets  of  execution  traces,  as  with  trace  properties  [70], 

14Lamport  [68]  gave  the  first  informal  definitions  of  safety  and  liveness  properties,  appropri¬ 
ating  the  names  from  Petri  net  theory,  and  he  also  gave  the  first  formal  definition  of  safety  [70]. 
Alpern  and  Schneider  [5]  gave  the  first  formal  definition  of  liveness  and  the  proof  that  all  trace 
properties  are  the  intersection  of  safety  and  liveness  properties;  they  later  established  the  corre¬ 
spondence  of  safety  to  invariance  and  of  liveness  to  well-foundedness  [6]. 
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then  the  extension  of  a  system  property  is  a  set  of  sets  of  traces  or,  equivalently,  a 
set  of  trace  properties.15  We16  named  this  type  of  set  a  hyperproperty  [29].  Every 
property  of  system  behavior  (for  systems  modeled  as  trace  sets)  can  be  speci¬ 
fied  as  a  hyperproperty,  by  definition.  Thus,  hyperproperties  can  describe  trace 
properties  and  moreover  can  describe  security  policies,  such  as  noninterference 
and  mean  response  time,  that  trace  properties  cannot. 

Chapter  4  shows  that  results  similar  to  those  from  the  theory  of  trace  prop¬ 
erties  carry  forward  to  hyperproperties: 

•  Every  hyperproperty  is  the  intersection  of  a  safety  hyperproperty  and  a 
liveness  hyperproperty.  (Henceforth,  these  terms  are  shortened  to  hyper¬ 
safety  and  hyperliveness.)  Hypersafety  and  hyperliveness  thus  form  a  basis 
from  which  all  hyperproperties  can  be  constructed. 

•  Hyperproperties  from  a  class  that  we  introduce,  called  k-safety,  can  be  ver¬ 
ified  by  using  invariance  arguments.  Our  verification  methodology  gen¬ 
eralizes  prior  work  on  using  invariance  arguments  to  verify  information- 
flow  policies  [12,115]. 

However,  we  have  not  obtained  complete  verification  methods  for  hypersafety 
or  for  hyperliveness. 

The  theory  we  develop  also  sheds  light  on  the  problematic  status  of  refine¬ 
ment  for  security  policies.  Refinement  never  invalidates  a  trace  property  but 
can  invalidate  a  hyperproperty:  Consider  a  system  7r  that  nondeterministically 
chooses  to  output  0,  1,  or  the  value  of  a  secret  bit  h.  System  n  satisfies  the 
security  policy  "The  possible  output  values  are  independent  of  the  values  of 
secrets."  But  one  refinement  of  n  is  the  system  that  always  outputs  h,  and  this 

15McLean  [86]  gave  the  first  formalization  of  security  policies  as  properties  of  trace  sets. 

16Joint  work  with  Fred  B.  Schneider. 
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system  does  not  satisfy  the  security  policy  We  characterize  the  entire  set  of 
hyperproperties  for  which  refinement  is  valid;  this  set  includes  the  safety  hy¬ 
perproperties. 

Safety  and  liveness  not  only  form  a  basis  for  trace  properties  and  hyper¬ 
properties,  but  they  also  have  a  surprisingly  deep  mathematical  characteriza¬ 
tion  in  terms  of  topology.  In  the  Plotkin  topology  on  trace  properties,  safety 
and  liveness  are  known  to  correspond  to  closed  and  dense  sets,  respectively  [5]. 
We  generalize  this  topological  characterization  to  hyperproperties  by  showing 
that  hypersafety  and  hyperliveness  also  correspond  to  closed  and  dense  sets  in 
a  new  topology,  which  turns  out  to  be  equivalent  to  the  lower  Vietoris  construc¬ 
tion  applied  to  the  Plotkin  topology  [109].  This  correspondence  could  be  used 
to  bring  results  from  topology  to  bear  on  hyperproperties. 

Chapter  5  applies  the  theory  of  hyperproperties  to  models  of  system  execu¬ 
tion  other  than  trace  sets.  We  show  that  relational  systems,  labeled  transition 
systems,  state  machines,  and  probabilistic  systems  all  can  be  encoded  as  trace 
sets  and  handled  using  hyperproperties. 

1.3  Dissertation  Outline 

Chapter  2  presents  the  new  mathematical  model  and  metric  for  quantitative 
information  flow,  as  applied  to  confidentiality.  Chapter  3  applies  those  ideas 
to  integrity.  Chapter  4  turns  to  the  problem  of  a  mathematical  taxonomy  of 
security  policies  and  presents  the  results  on  hyperproperties.  Chapter  5  extends 
those  ideas  to  system  models  beyond  trace  sets.  Related  work  is  covered  within 
each  chapter.  Chapter  6  concludes. 
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CHAPTER  2 


QUANTIFICATION  OF  CONFIDENTIALITY* 

Qualitative  security  properties,  such  as  noninterference  [46],  typically  either 
prohibit  any  flow  of  information  from  a  high  security  level  to  a  lower  level, 
or  they  allow  any  information  to  flow  provided  it  passes  through  some  release 
mechanism.  For  a  program  whose  correctness  requires  flow  from  high  to  low, 
the  former  policy  is  too  restrictive  and  the  latter  can  lead  to  unbounded  leakage 
of  information.  Quantitative  confidentiality  policies,  such  as  "at  most  k  bits  leak 
per  execution  of  the  program,"  allow  information  flows  but  at  restricted  rates. 
Such  policies  are  useful  when  analyzing  programs  whose  nature  requires  that 
some — but  not  too  much — information  be  leaked,  such  as  the  password  checker 
from  chapter  1. 

Recall  that  the  amount  of  secret  information  a  program  leaks  has  tradition¬ 
ally  been  defined  using  change  in  uncertainty,  but  that  definition  leads  to  an 
anomaly  when  analyzing  the  password  checker.  We  argued  informally  in  chap¬ 
ter  1  that  accuracy  of  beliefs  provides  a  better  explanation  of  the  password 
checker.  This  chapter  substantiates  that  argument  with  formal  definitions  and 
examples. 

This  chapter  proceeds  as  follows.  Basic  representations  for  beliefs  and  pro¬ 
grams  are  stated  in  §2.1.  A  model  of  the  interaction  between  attackers  and  sys¬ 
tems,  describing  how  attackers  update  beliefs  by  observing  execution  of  pro¬ 
grams,  is  given  in  §2.2.  A  new  quantitative  flow  metric,  based  on  information 
theory,  is  defined  in  §2.3.  The  new  metric  characterizes  the  amount  of  informa¬ 
tion  flow  that  results  from  change  in  the  accuracy  of  an  attacker's  belief.  The 

*This  chapter  contains  material  from  a  previously  published  paper  [28],  which  is  ©  2005 
IEEE  and  reprinted  with  permission  from  Proceedings  of  the  18th  IEEE  Computer  Security  Founda¬ 
tions  Workshop. 
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metric  can  also  be  instantiated  to  quantify  change  in  uncertainty,  and  thus  it 
generalizes  previous  information-flow  metrics.  The  model  and  metric  are  for¬ 
mulated  for  use  with  any  programming  model  that  can  be  given  a  denotational 
semantics  compatible  with  the  representation  of  beliefs,  as  §2.4  illustrates  with 
a  particular  programming  language  (while-programs  plus  probabilistic  choice). 
The  model  is  extended  in  §2.5  to  programs  in  which  nondeterministic  choices 
are  resolved  by  insiders,  who  are  allowed  to  observe  secret  values.  Related 
work  is  discussed  in  §2.6,  and  §2.7  concludes.  Most  proofs  are  delayed  from  the 
main  body  to  appendix  2.  A. 

2.1  Incorporating  Beliefs 

A  belief  is  a  statement  an  agent  makes  about  the  state  of  the  world,  accompanied 
by  some  characterization  of  how  certain  the  agent  is  about  the  truthfulness  of 
the  statement.  Our  agents  will  reason  about  probabilistic  programs,  so  we  begin 
by  developing  mathematical  structures  for  representing  programs  and  beliefs. 

2.1.1  Distributions 

A  frequency  distribution  is  a  function  <5  that  maps  a  program  state  to  a  frequency , 
which  is  a  non-negative  real  number.  A  frequency  distribution  is  essentially  an 
unnormalized  probability  distribution  over  program  states;  it  is  easier  to  define 
a  programming  language  semantics  by  using  frequency  distributions  than  by 
using  probability  distributions  [101].  Henceforth,  we  write  "distribution"  to 
mean  "frequency  distribution." 

The  set  of  all  program  states  is  State,  and  the  set  of  all  distributions  is  Dist. 
The  structure  of  State  is  mostly  unimportant;  it  can  be  instantiated  according  to 
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the  needs  of  any  particular  language  or  system.  For  our  examples,  states  map 
variables  to  values,  where  Var  and  Val  are  both  countable  sets: 

v  E  Var, 

er  e  State  =  Var  — >  Val, 

5  E  Dist  =  State  — >  M+. 

We  write  a  state  as  a  list  of  mappings — for  example,  (g  i— >•  A,  a  i— >•  0)  is  a  state  in 
which  variable  g  has  value  A  and  a  has  value  0. 

The  mass  ||<5||  in  a  distribution  5  is  the  sum  of  frequencies:1 

11*11  4  (E  *  =  *(*))■ 

A  probability  distribution  has  mass  1,  but  a  frequency  distribution  may  have 
any  non-negative  mass.  A  point  mass  is  a  probability  distribution  that  maps  a 
single  state  to  1.  It  is  denoted  by  placing  a  dot  over  that  single  state: 

a  =  Xcr' .  if  o'  =  a  then  1  else  0. 


2.1.2  Programs 

Execution  of  program  S  is  described  by  a  denotational  semantics  in  which  the 
meaning  [S]  of  S'  is  a  function  of  type  State  — >  Dist.  This  semantics  describes 
the  frequency  of  termination  in  a  given  state:  if  [S]cr  =  5,  then  the  frequency 
that  S  terminates  in  a'  when  begun  in  a  is  d(ar).  This  semantics  can  be  lifted  to 
a  function  of  type  Dist  — >  Dist  by  the  following  definition: 

[S\S  =  (£  a  :  %)  •  [S]u). 

1Formula  (*  x  G  I)  :  R  :  P)  is  a  quantification  in  which  *  is  the  quantifier  (such  as  V  or  S),  x 
is  the  variable  that  is  bound  in  R  and  P,  D  is  the  domain  of  x,  R  is  the  range,  and  P  is  the  body. 
We  omit  D,  R ,  and  even  x  when  they  are  clear  from  context;  an  omitted  range  means  R  =  true. 
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Thus,  the  meaning  of  S  given  a  distribution  on  inputs  is  completely  determined 
by  the  meaning  of  S  given  a  state  as  input.  By  defining  programs  in  terms  of 
how  they  operate  on  distributions,  we  enable  analysis  of  probabilistic  programs. 

Our  examples  use  while-programs  extended  with  a  probabilistic  choice  con¬ 
struct.  Let  metavariables  S,  v,  E,  and  B  range  over  programs,  variables,  arith¬ 
metic  expressions,  and  Boolean  expressions,  respectively.  Evaluation  of  expres¬ 
sions  is  assumed  side-effect  free,  but  we  do  not  otherwise  prescribe  their  syntax 
or  semantics.  The  syntax  of  the  language  is  as  follows: 

S  ::  =  skip  \  v  :=  E  \  S]  S  \  if  B  then  S  else  S 
|  while  A  do  S'  |  S  p\\  S 

The  operational  semantics  for  the  deterministic  subset  of  this  language  is  stan¬ 
dard.  Probabilistic  choice  Si  p  []  S2  executes  Si  with  probability  p  or  S2  with 
probability  1  —  p,  where  0  <  p  <  1.  A  denotational  semantics  for  this  language 
is  given  in  §2.4. 

2.1.3  Labels  and  Projections 

We  need  a  way  to  identify  secret  data;  confidentiality  labels  serve  this  purpose. 
For  simplicity,  assume  there  are  only  two  labels:  a  label  L  that  indicates  low- 
confidentiality  (public)  data,  and  a  label  H  that  indicates  high-confidentiality 
(secret)  data.  Assume  that  State  is  a  product  of  two  domains  State L  and  States, 
which  contain  the  low-  and  high-labeled  data,  respectively.  A  low  state  is  an 
element  aL  e  State /;  a  high  state  is  an  element  aH  €  State//.  The  projection 
of  state  <7  G  State  onto  States  is  denoted  a  \  L;  this  is  the  part  of  a  visible  to 
the  attacker.  Projection  onto  State//,  the  part  of  a  not  visible  to  the  attacker,  is 
denoted  a  \  H. 
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Each  variable  in  a  program  is  subscripted  by  a  label  to  indicate  the  confiden¬ 
tiality  of  the  information  stored  in  that  variable;  for  example,  xl  is  a  variable 
that  contains  low  information.  For  convenience,  let  variable  /  be  labeled  L  and 
variable  h  be  labeled  H.  VarL  is  the  set  of  variables  in  a  program  that  are  labeled 
L,  so  States  =  Var^  — >  Val.  The  low  projection  cr  \L  of  state  a  is 

cr  \  L  =  Xv  E  Var L  .  cr( v). 

States  a  and  a'  are  low-ecjiiivalent,  written  a  =L  a' ,  if  they  have  the  same  low 
projection: 

(?  =L  o'  =  (<J\L)  =  (cr'  fL). 

Distributions  also  have  projections.  Let  S  be  a  distribution  and  07,  a  low  state. 
Then  (d  \  L)(crL)  is  the  combined  frequency  of  those  states  whose  low  projection 
is  aL: 

5\L  =  A crL  E  StateL  •  {J2  '■  \ L)  =  aL  ■  S(a')). 

High  projection  and  high  equivalence  are  defined  by  replacing  occurrences  of  L 
with  H  in  the  definitions  above. 

2.1.4  Belief  Representation 

To  be  usable  in  our  framework,  a  belief  representation  must  support  certain 
natural  operations.  Let  b  and  b'  be  beliefs  ranging  over  sets  of  possible  worlds  W 
and  W' ,  respectively,  where  a  possible  world  is  some  elementary  outcome  about 
which  beliefs  can  be  held  [52], 

1.  Belief  product  (g)  combines  b  and  b'  into  a  new  belief  b  g)  b'  about  possible 
worlds  W  x  W’ ,  where  W  and  W'  are  disjoint. 
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2.  Belief  update  b\U  is  the  belief  that  results  when  b  is  updated  to  include  new 
information  that  the  actual  world  is  in  a  set  IJ  C  W  of  possible  worlds. 

3.  Belief  distance  D(b  — >  b')  is  a  real  number  r  >  0  that  quantifies  differences 
between  b  and  b' . 

Although  the  results  in  this  chapter  are,  for  the  most  part,  independent  of 
any  particular  representation,  the  rest  of  this  chapter  uses  distributions  to  rep¬ 
resent  beliefs.  High  states  are  the  possible  worlds  for  beliefs,  and  a  belief  is  a 
probability  distribution  over  high  states: 

b  G  Belief  =  State#  — ■>  M+,  s.t.  ||6||  =  1. 

Thus,  beliefs  correspond  to  probability  measures.  Probability  measures  are 
well-studied  as  a  belief  representation  [52],  and  they  have  several  advantages 
here:  they  are  familiar,  quantitative,  support  the  operations  required  above,  and 
admit  a  programming  language  semantics  (as  shown  in  §2.4).  There  is  also  a 
nice  justification  for  the  numbers  they  produce:  roughly,  b(a)  characterizes  the 
amount  of  money  an  attacker  should  be  willing  to  bet  that  a  is  the  actual  state 
of  the  system  [52],  Other  choices  of  belief  representation  could  include  belief 
functions  or  sets  of  probability  measures  [52],  Although  these  alternatives  are 
more  expressive  than  probability  measures,  it  is  more  complicated  to  define  the 
required  operations  for  them. 

For  belief  product  (8),  we  employ  a  distribution  product  8)  of  two  distribu¬ 
tions  <5i  :  A  — >  M+  and  52  :  B  — >  M+,  with  A  and  B  disjoint: 

<5i  8  82  =  A(cri,  a2)  e  A  x  B  .  5i(ai)  •  S2(a2). 

It  is  easy  to  check  that  if  b  and  b'  are  beliefs,  b  8  b'  is  too. 

For  belief  update  |,  we  use  distribution  conditioning: 

5\U  =  Xa  .  if  a  e  U  then  -= — r— vv — else  0. 

(A,  a  eU  :  6{cr)) 
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For  belief  distance  D  we  use  relative  entropy,  an  information-theoretic  met¬ 
ric  [59]  for  the  distance  between  distributions: 

D(b-°b')  4  :(/(<,)•  log£g). 

The  base  of  the  logarithm  in  D  can  be  chosen  arbitrarily;  we  use  base  2  and  write 
Ig  to  indicate  log2,  making  bits  the  unit  of  measurement  for  distance.  The  relative 
entropy  of  b  to  b'  is  the  expected  inefficiency  (that  is,  the  number  of  additional 
bits  that  must  be  sent)  of  an  optimal  code  that  is  constructed  by  assuming  an 
inaccurate  distribution  over  symbols  b  when  the  real  distribution  is  b'  [32],  Like 
an  analytic  metric,  D(b  —>  b')  is  always  at  least  zero  and  D{b  — >  b ')  equals  zero 
only  when  b  =  b' . 2 

Relative  entropy  has  the  property  that  if  b\a)  >  0  and  b(a)  =  0,  then 
D(b  — >  b')  =  oo.  Intuitively,  b'  is  "infinitely  surprising"  because  it  regards  a 
as  possible  whereas  b  regards  a  as  impossible.  To  avoid  this  anomaly,  beliefs 
may  be  required  to  satisfy  an  admissibility  restriction,  which  ensures  that  attack¬ 
ers  do  not  initially  believe  that  certain  states  are  impossible.  For  example,  a 
belief  might  be  restricted  such  that  it  never  differs  by  more  than  a  factor  of  e 
from  a  uniform  distribution.  This  restriction  could  be  useful  with  the  password 
checker  (c.f.  §1.2)  if  it  is  reasonable  to  assume  that  attackers  believe  that  all  pass¬ 
words  are  nearly  equally  likely.  Or,  the  attacker's  belief  may  be  required  to 
be  a  maximum  entropy  distribution  [32]  with  respect  to  attacker-specified  con¬ 
straints.  This  restriction  could  be  useful  with  the  password  checker  if  attackers 
believe  that  passwords  are  English  words  (which  is  a  kind  of  constraint).  Other 
admissibility  restrictions  can  be  substituted  for  these  when  stronger  assump¬ 
tions  can  be  made  about  attacker  beliefs. 

2Unlike  an  analytic  metric,  D  does  not  satisfy  symmetry  or  the  triangle  inequality  However, 
it  seems  unreasonable  to  assume  that  either  of  these  properties  holds  for  beliefs,  since  it  can  be 
easier  to  rule  out  a  possibility  from  a  belief  than  to  add  a  new  possibility,  or  vice-versa. 
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Figure  2.1:  Channels  in  confidentiality  experiment 

2.2  Confidentiality  Experiments 

We  formalize  as  a  confidentiality  experiment  (or  simply  an  experiment)  how  an 
attacker,  an  agent  that  reasons  about  secret  data,  revises  his  beliefs  from  interac¬ 
tion  with  program  that  is  executed  by  a  system.  The  attacker  should  not  learn 
about  the  high  input  to  the  program  but  is  allowed  to  observe  and  influence 
low  inputs  and  outputs.  Other  agents  (a  system  operator,  other  users  of  the 
system  with  their  own  high  data,  an  informant  upon  which  the  attacker  relies, 
etc.)  might  be  involved  when  an  attacker  interacts  with  a  system;  however,  it 
suffices  to  condense  all  of  these  to  just  the  attacker  and  the  system.  The  channels 
between  agents  and  the  program  are  depicted  in  figure  2.1  and  are  described  in 
detail  below. 

We  conservatively  assume  that  the  attacker  knows  the  code  of  the  program 
with  which  he  interacts.  For  simplicity,  we  assume  that  the  program  always 
terminates  and  that  it  never  modifies  the  high  state.  Both  restrictions  can  be 
lifted  without  significant  changes,  as  shown  in  §2.2.4. 

2.2.1  Confidentiality  Experiment  Protocol 

Formally,  an  experiment  £  is  described  by  a  tuple, 

£  =  ( S ,  bH,  &h,  cl), 
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An  experiment  S  =  (S,  bn.  an,  of)  is  conducted  as  follows. 

1.  The  attacker  chooses  a  prebelief  bH  about  the  high  state. 

2.  (a)  The  system  picks  a  high  state  aH. 

(b)  The  attacker  picks  a  low  state  aL. 

3.  The  attacker  predicts  the  output  distribution:  8'A  =  [A]  (aL  ®  bH ). 

4.  The  system  executes  program  S,  which  produces  a  state  o'  e  8'  as  output, 
where  5'  =  [S']  (00  0  aH).  The  attacker  observes  the  low  projection  of  the 
output  state:  o  —  a'\L. 

5.  The  attacker  infers  a  postbelief:  b'H  =  (<*» \H. 

Figure  2.2:  Experiment  protocol 


where  S  is  the  program,  bn  is  the  attacker's  belief  at  the  beginning  of  the  experi¬ 
ment,  aH  is  the  high  projection  of  the  initial  state,  and  aL  is  the  low  projection  of 
the  initial  state.  The  protocol  for  experiments,  which  uses  some  notation  defined 
below,  is  summarized  in  figure  2.2.  Here  is  a  justification  for  the  protocol. 

An  attacker's  prebelief  bH/  describing  his  belief  at  the  beginning  of  the  exper¬ 
iment  (step  1),  may  be  chosen  arbitrarily  (subject  to  an  admissibility  restriction 
as  in  §2.1.4)  or  may  be  informed  by  previous  experiments.  In  a  series  of  ex¬ 
periments,  the  postbelief  from  one  experiment  typically  becomes  the  prebelief 
to  the  next.  The  attacker  might  even  choose  a  prebelief  bH  that  contradicts  his 
true  subjective  probability  distribution  for  the  state,  and  this  gives  our  analysis 
additional  power  by  allowing  the  attacker  to  conduct  experiments  to  answer 
questions  such  as  "What  would  happen  if  I  were  to  believe  5#?" 

The  system  chooses  aH  (step  2a),  the  high  projection  of  the  initial  state,  and 
this  part  of  the  state  might  remain  constant  from  one  experiment  to  the  next 
or  might  vary.  For  example,  Unix  passwords  do  not  usually  change  frequently, 
but  the  output  displayed  on  an  RSA  SecurlD  token  changes  each  minute.  We 
conservatively  assume  that  the  attacker  chooses  all  of  aL  (step  2b),  the  low  pro- 
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jection  of  the  initial  state.  This  gives  the  attacker  additional  power  in  controlling 
execution  of  the  program,  which  he  can  use  to  attempt  to  maximize  the  amount 
of  information  flow.  The  attacker's  choice  of  ol  is  thus  likely  to  be  influenced 
by  bH,  but  for  generality,  we  do  not  require  there  be  such  a  strategy. 

Using  the  semantics  of  S  along  with  prebelief  bH  as  a  distribution  on  high 
input,  the  attacker  conducts  a  "thought  experiment"  to  generate  a  prediction  of 
the  output  distribution  (step  3).  We  define  prediction  S'A  to  correlate  the  output 
state  with  the  high  input  state: 

8  a  =  lS\(vL®bH). 

Program  S  is  executed  (step  4)  only  once  in  each  experiment;  multiple  exe¬ 
cutions  are  modeled  by  multiple  experiments.  The  meaning  of  S  given  inputs 
oL  and  an  is  an  output  distribution  S': 

5'  =  [S](<tl  ®  ojf). 

From  5'  the  attacker  makes  an  observation,  which  is  a  low  projection  of  an  output 
state.  Probabilistic  programs  may  yield  many  possible  output  states,  but  in  a 
single  execution  of  the  program,  only  one  output  state  is  actually  produced. 
This  output  state  o'  is  produced  with  frequency  S' (o').  We  write  o'  e  S'  to  denote 
that  o'  is  in  the  support  of  (i.e.,  has  positive  frequency  according  to)  S'.  In  a  single 
experiment,  the  attacker  is  allowed  only  a  single  observation.  The  observation 
o  resulting  from  o'  is  o'  \L. 

Finally,  the  attacker  incorporates  any  new  inferences  that  can  be  made  from 
observation  o  by  conditioning  prediction  S'A.  The  result  is  projected  to  H  to 
produce  the  attacker's  postbelief  b'H  (step  5): 

b'H  =  (S'A\o)tH. 
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Here,  conditioning  operator  5\o  is  defined  in  terms  of  conditioning  operator  S\U. 
The  new  operator  removes  all  mass  in  distribution  5  that  is  inconsistent  with 
observation  o,  then  normalizes  the  result: 

S\o  =  b\W  \  <r'  \L  —  o] 

=  \a  .  if  (a  f  L)  =  o  then  else  0. 

2.2.2  Password  Checking  as  an  Experiment 

Our  experiment  model  allows  the  informal  reasoning  in  §1.2  to  be  made  pre¬ 
cise.  For  example,  consider  the  password  checker;  adding  confidentiality  labels 
yields: 

PWC  :  if  pn  =  Ql  then  cll  :=  1  else  :=  0 

The  attacker  begins  an  experiment  by  choosing  prebelief  bH,  perhaps  as  spec¬ 
ified  in  the  column  labeled  bn  of  table  2.1.  Next,  the  system  chooses  initial  high 
projection  aH/  and  the  attacker  chooses  initial  low  projection  aL.  In  the  first  ex¬ 
periment  in  §1.2,  the  password  was  A,  so  the  system  chooses  aH  =  ( p  i— >  A). 
Similarly,  the  attacker  chooses  ol  =  (g  i— >  A,  a  i— >  0).  (The  initial  value  of  a  is 
actually  irrelevant,  since  it  is  never  used  by  the  program  and  a  is  set  along  all 
control  paths.)  Next,  the  system  executes  PWC.  Output  distribution  5'  should 
be  the  point  mass  at  state  o'  =  (p  i— >  A,  g  i— >  A,  a  i— >  1);  the  semantics  in  §2.4  will 
validate  this  intuition.  Since  a'  is  the  only  state  that  can  be  sampled  from  S',  the 
attacker's  observation  o\  is  a'  \L  —  (g  i— >•  A,  a  i— »•  1). 

Finally,  the  attacker  infers  a  postbelief.  He  conducts  a  thought  experiment, 
predicting  an  output  distribution  S'A  =  [PWC\{crL  ®  bH),  given  in  table  2.2.  The 
ellipsis  in  the  final  row  of  the  table  indicates  that  all  states  not  shown  have  fre¬ 
quency  0.  This  distribution  is  intuitively  correct:  the  attacker  believes  that  he 
has  a  98%  chance  of  being  authenticated,  whereas  1%  of  the  time  he  will  fail  to 
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Table  2.1:  Beliefs  about  pH 


Ph 

bH 

HI 

b H2 

A 

0.98 

1 

0 

B 

0.01 

0 

0.5 

C 

0.01 

0 

0.5 

Table  2.2:  Distributions  on  PWC  output 


P 

9 

a 

#a\°i 

S'a\°2 

A 

A 

0 

0 

0 

0 

A 

A 

1 

0.98 

1 

0 

B 

A 

0 

0.01 

0 

0.5 

B 

A 

1 

0 

0 

0 

C 

A 

0 

0.01 

0 

0.5 

C 

A 

1 

0 

0 

0 

0 

0 

0 

be  authenticated  because  the  password  is  B,  and  another  1%  because  it  is  C.  The 
attacker  conditions  prediction  d'A  on  observation  olt  obtaining  8'A\olr  also  shown 
in  table  2.2.  Projecting  to  high  yields  the  attacker's  postbelief,  b'Hl,  shown  in 
table  2.1.  This  postbelief  is  what  the  informal  reasoning  in  §1.2  suggested:  the 
attacker  is  certain  that  the  password  is  A. 

The  second  experiment  in  §1.2  can  also  be  formalized.  In  it,  bH  and  aL  re¬ 
main  the  same  as  before,  but  aH  becomes  (p  i— >  C).  Observation  o2  is  therefore 
the  point  mass  at  ( g  i— >  A,  a  i— >  0).  Prediction  d'A  remains  unchanged,  and  con¬ 
ditioned  on  o2  it  becomes  S'A\ o2,  shown  in  table  2.2.  Projecting  to  high  yields 
postbelief  b'H2  from  table  2.1.  This  postbelief  again  agrees  with  the  informal  rea¬ 
soning:  the  attacker  believes  that  there  is  a  50%  chance  each  for  the  password  to 
be  B  or  C. 
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2.2.3  Bayesian  Belief  Revision 


The  formula  the  attacker  uses  to  infer  a  postbelief  is  an  application  of  Bayesian 
inference,  which  is  a  standard  technique  used  in  applied  statistics  for  making 
inferences  when  uncertainty  is  made  explicit  through  probability  models  [45]. 
The  attacker  therefore  reasons  rationally,  according  to  Halpern's  rationality  ax¬ 
ioms  [52],  though  the  literature  on  human  behavior  shows  that  this  is  not  always 
the  same  as  human  reasoning  [60,64], 

Let  belief  revision  operator  3  yield  the  postbelief  from  an  experiment  8  = 
(. S ,  bH,  & h  )  crL),  given  observation  o: 

3(8,o)  4  ({S}(bL®bH)\o)\H. 


We  write  b'H  e  3(8)  to  denote  that  there  exists  some  o  for  which  b'H  =  3(8,  o). 

Recall  Bayes'  rule  for  updating  a  hypothesis  Hyp  with  an  observation  obs: 

Pr  (Hyp I  obs )  =  _ Pr  (Hyp)  Pr  (obs\Hyp) _ 

'  m  1  (£  Hyp'  :  Pr  (Hyp1)  Pr  (obs\Hyp'))  ' 

In  our  model,  the  attacker's  hypothesis  is  about  the  values  of  high  states,  so 
the  domain  of  hypotheses  is  State  [  H.  Therefore  Pr  (Hyp),  the  probability  the 
attacker  ascribes  to  a  particular  hypothesis,  is  modeled  by  bH((?H)-  The  prob¬ 
ability  Pr  (obs\Hyp)  the  attacker  ascribes  to  an  observation  given  the  assumed 
truth  of  a  hypothesis  is  modeled  by  the  program  semantics:  the  probability  of 
observation  o  given  an  assumed  high  input  aH  is  ([S'] (hi  eg)  bH)  [  L)(o). 

Given  experiment  8  =  (S,  bn.  (rn,  aj),  instantiating  Bayes'  rule  on  these 
probabilities  yields  Bayesian  inference  BI(8,  o),  which  is  Pr  (aH\o): 

W{F  \  =  bH(aH)  ■  (|S](hL  ®aH)  \ L)(o) 

1  J  (E  o’h  ■  bH(a'H)-(\S\(bL®b'H)\L)(o)y 

With  this  instantiation,  we  can  show  that  the  experiment  protocol  leads  an  at¬ 
tacker  to  update  his  belief  according  to  Bayesian  inference: 
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Theorem  2.1.  B(S,o)(aH)  =  BI(£,o). 


Proof.  In  appendix  2. A.  □ 

2.2.4  Mutable  High  Inputs  and  Nontermination 

Two  simplifying  assumptions  about  programs  were  invoked  by  §2.2.1:  pro¬ 
grams  never  modify  high  input,  and  they  always  terminate.  We  now  dispense 
with  these  technical  issues. 

Mutable  high  inputs.  If  program  S  were  to  modify  the  high  state,  the  at¬ 
tacker's  prediction  S'A  would  correlate  high  outputs  with  low  outputs.  How¬ 
ever,  to  calculate  a  postbelief  (in  step  5),  S'A  must  correlate  high  inputs  with  low 
outputs.  So  our  experiment  protocol  requires  the  high  input  state  be  preserved 

in  S'A. 

Informally,  we  can  do  this  by  keeping  a  copy  of  the  initial  high  inputs  in  the 
program  state.  This  copy  is  never  modified  by  the  program.  Thus,  the  copy  is 
preserved  in  the  final  output  state,  and  the  attacker  can  again  establish  a  corre¬ 
lation  between  high  inputs  and  low  outputs. 

Formally,  let  the  notation  b°H  mean  the  same  distribution  as  bn,  except  that 
each  state  of  its  domain  has  a  0  as  a  superscript.  So,  if  bH  ascribes  probability 
p  to  state  a,  then  b°H  ascribes  probability  p  to  the  state  a0.  We  assume  that  S 
cannot  modify  states  with  a  superscript  0.  In  the  case  that  states  map  variables  to 
values,  this  could  be  achieved  by  defining  a0  to  be  the  same  state  as  a,  but  with 
the  superscript  0  attached  to  variables;  for  example,  if  a(v)  =  1  then  cr°(t'°)  =  1. 
Note  that  S  cannot  modify  a0  if  did  not  originally  contain  any  variables  with 
superscripts. 
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Using  this  notation,  the  belief  revision  operator  is  extended  to  B' ,  which  al¬ 
lows  S  to  modify  the  high  state  in  experiment  E  =  (S,  bn.  an,  af): 

B'(S,o )  4  ({[Sl(crL®bH®b0H)\o))\H0. 

In  this  definition,  the  high  input  state  is  preserved  by  introducing  the  product 
with  b°H,  and  the  attacker's  postbelief  about  the  input  is  recovered  by  restricting 
to  H°,  the  high  input  state  with  the  superscript  0. 

Nontermination.  To  eliminate  the  second  assumption,  note  that  program  S 
must  terminate  for  an  attacker  to  obtain  a  low  state  as  an  observation  when 
executing  S.  If  the  attacker  has  an  oracle  that  decides  nontermination,3  then 
nontermination  can  be  modeled  in  the  standard  denotational  style  with  a  state 
_L  representing  divergence,  as  follows. 

Let  States  =  State  U  {_L},  and  _L  \  L  =  _L.  Nontermination  is  now  allowed  as 
an  observation,  leading  to  an  extended  belief  revision  operator  B]±: 

B]±(£,o)  4  (out±(S,  aL  <g)  bH  <S>  b°H)\o) \H°. 

3An  attacker  that  cannot  detect  nontermination  is  more  difficult  to  model.  At  some  point 
during  the  execution  of  the  program,  he  can  stop  waiting  for  the  program  to  terminate  and 
declare  that  he  has  observed  nontermination.  However,  he  might  be  incorrect  in  doing  so — 
leading  to  beliefs  about  nontermination  and  instruction  timings.  The  interaction  of  these  beliefs 
with  beliefs  about  high  inputs  would  be  complex;  we  do  not  address  it  here. 
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Observation  o  is  now  produced  from  output  distribution  S'  =  out±(S,  aL  <E>  &h)- 
Function  out±(S,5)  produces  a  distribution  which  yields  the  frequency  that  S 
terminates,  or  fails  to  terminate,  on  input  distribution  5: 

out±(S,5)  =  Act  :  State^  .  if  a  =  _L 
then  ||5||  -  ||[5]5|| 
else  ([S]5)(<r). 

If  S  does  not  terminate  on  some  input  states  in  5,  output  distribution  [S']  5  will 
contain  less  mass  than  5;  otherwise,  ||<5||  =  ||  [S]<5||.  Missing  mass  corresponds  to 
nontermination  [83,101],  so  out±  maps  the  missing  mass  to  _L. 

2.3  Quantification  of  Information  Flow 

The  informal  analysis  of  PWC  in  §1.2  suggests  that  information  flow  corre¬ 
sponds  to  an  improvement  in  the  accuracy  of  an  attacker's  belief.  We  now  for¬ 
malize  that  analysis  by  using  change  in  accuracy,  as  measured  by  belief  distance 
D,  to  quantify  information  flow. 

2.3.1  Information  Flow  from  an  Outcome 

Given  an  experiment  E  =  (S.  bH ,  crH,  aL),  an  outcome  is  a  postbelief  b’H  such  that 
b'H  G  13(E),  where  B  is  the  belief  revision  operator  from  §2.2.3.  Recall  from  §2.1.4 
that  D(b  — >  If)  is  the  distance  from  belief  b  to  belief  b'.  The  accuracy  of  the 
attacker's  prebelief  bH  in  experiment  E  is  D(bH  ->  aH);  the  accuracy  of  outcome 
b'H,  the  attacker's  postbelief,  is  D(b'H  ->  aH). 
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We  define  the  amount  of  information  flow  Q  caused  by  outcome  b'H  of  ex¬ 
periment  £  as  the  difference  of  those  two  quantities: 

Q(£,  b'n)  —  D{bH  — >  &h)  ~  D(b'H  — >  &h)- 

Thus  quantity  of  flow  Q  is  the  improvement  in  the  accuracy  of  the  attacker's  be¬ 
lief.  This  amount  can  positive  or  negative;  we  defer  discussion  of  negative  flow 
to  §2.3.3.  Since  D  is  instantiated  with  relative  entropy,  the  unit  of  measurement 
for  Q  is  (information-theoretic)  bits. 

With  an  additional  definition  from  information  theory,  a  more  consequential 
characterization  of  Q  is  possible.  Let  T$(F)  denote  the  information  contained  in 
event  F  drawn  from  probability  distribution  5: 

IfF)  4  -lgPr*(F). 

Information  is  sometimes  called  "surprise"  because  X  quantifies  how  surprising 
an  event  is;  for  example,  when  an  event  that  has  probability  1  occurs,  no  infor¬ 
mation  (0  bits)  is  conveyed  because  the  occurrence  is  completely  unsurprising. 

For  an  attacker,  the  outcome  of  an  experiment  involves  two  unknowns: 
the  initial  high  state  an  and  the  probabilistic  choices  made  by  the  program. 
Let  5s  =  [Sj(aL  <E>  frH)  \  L  be  the  system's  distribution  on  low  outputs,  and 
5a  =  [£](xl  <E>  bH)  \ L  be  the  attacker's  distribution  on  low  outputs.  X5a(o)  quan¬ 
tifies  the  information  contained  in  o  about  both  unknowns,  but  T$s  ( o )  quanti¬ 
fies  only  the  probabilistic  choices  made  by  the  program.4  For  programs  that 
make  no  probabilistic  choices,  5a  contains  information  about  only  the  initial 
high  state,  and  5S  is  a  point  mass  at  some  state  o  such  that  a  \  L  =  o.  So  amount 
of  information  X5s(o)  is  0.  For  probabilistic  programs,  X5g(o)  is  generally  not 

4The  technique  used  in  §2.2.4  for  modeling  nontermination  ensures  that  5a  and  5s  are  prob¬ 
ability  distributions.  Thus,  XsA  and  Xss  are  well-defined. 
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equal  to  0;  subtracting  it  removes  all  the  information  contained  in  Ts A(o)  that  is 
solely  about  the  results  of  probabilistic  choices,  leaving  information  only  about 
high  inputs. 

The  following  theorem  states  that  Q  quantifies  the  information  about  high 
input  aH  contained  in  observation  o: 

Theorem  2.2.  Q{S,b'H )  =1Sa(o)  -1Ss(o). 

Proof.  In  appendix  2.  A.  □ 

As  an  example,  consider  the  experiments  involving  PWC  in  §2.2.2.  The  first 
experiment  £\  has  the  attacker  correctly  guess  the  password  A,  so 

£i  =  (PWC,  bH,  (pw  A),(g  i->  0)), 


where  table  2.1  defines  bn  (and  the  other  beliefs  used  below).  Only  one  outcome, 
b'H , ,  is  possible  from  this  experiment.  We  calculate  the  amount  of  flow  from  this 
outcome,  letting  aH  =  (p  ^  A): 


Q(£i,b'i 


HI) 


D(bH  -t>  aH)  —  D(b'm  ->  &h) 

(S  °'h  '■  ^h(ct'h)  ■  lg  bn{j^)  )  _  (S  °'h  '■  °h{(j'h)  ■  Ig  ) 

-lg  bH(aH)  +  lg  b'H1(aH) 

0.0291 


This  small  flow  makes  sense  because  the  outcome  has  only  confirmed  some¬ 
thing  the  attacker  already  believed  to  be  almost  certainly  true.  In  experiment  £2 
the  attacker  guesses  incorrectly: 


£2  —  (PWC,  bH,  (p  1— >•  C) ,  (g  1— >•  A,  a  1— *  0)). 


Again,  only  one  outcome  is  possible  from  this  experiment,  and  calculating 
Q(£-2,  bH 2)  yields  an  information  flow  of  5.6439  bits.  This  higher  information 
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flow  makes  sense,  because  the  attacker's  postbelief  is  much  closer  to  correctly 
identifying  the  high  state.  The  attacker's  prebelief  bH  ascribed  a  0.02  probability 
to  the  event  p  ^  A,  and  the  information  conveyed  by  an  event  with  probability 
0.02  is  5.6439.  This  suggests  that  Q  is  the  right  metric  for  the  information  about 
high  input  contained  in  the  observation. 

The  information  flow  of  5.6439  bits  in  experiment  £2  might  seem  surprisingly 
high.  At  most  two  bits  are  required  to  store  password  p  in  memory,  so  why 
does  the  program  leak  more  than  five  bits?  Here,  the  greater  leakage  occurs 
because  the  attacker's  belief  is  not  uniform.  A  uniform  prebelief  (ascribing  1/3 
probability  to  each  password  A,  B,  and  C)  would,  in  a  series  of  experiments, 
cause  the  attacker  to  learn  a  total  of  lg  3  ~  1.6  bits.  However,  belief  bH  is  more 
erroneous  than  the  uniform  belief,  so  a  larger  amount  of  information  is  required 
to  correct  it. 

An  uncertainty-based  definition  for  information  flow  does  not  produce  a 
reasonable  leakage  for  this  experiment.  The  attacker's  initial  uncertainty  about 
p  is  H(bu)  =  0.1614  bits,  where  'H  is  the  information-theoretic  metric  of  entropy, 
or  uncertainty,  in  a  probability  distribution  S: 

~  ~(E  *  '■  8(<r)  '  lg%))- 

In  the  second  experiment,  the  attacker's  final  uncertainty  about  p  is  7i(bH2)  =  1- 
The  reduction  in  uncertainty  is  0.1614  —  1  =  —0.8386,  hence  there  is  actually 
an  increase  in  uncertainty.  So  the  uncertainty-based  analysis  that  we  have  per¬ 
formed  is  forced  to  conclude  that  information  did  not  flow  to  the  attacker.  But 
this  is  clearly  not  the  case — the  attacker's  belief  has  been  guided  closer  to  reality 
by  the  experiment.  The  uncertainty-based  analysis  ignores  reality  by  comparing 
bH  and  bH2  against  themselves,  instead  of  against  the  high  state  a H. 
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2.3.2  Interpreting  Metric  Q 


According  to  theorem  2.2,  metric  Q  correctly  quantifies  the  amount  of  informa¬ 
tion  flow,  in  bits.  But  what  does  it  mean  to  leak  one  bit  of  information?  The 
next  theorem  states  that  k  bits  of  leakage  correspond  to  a  k- fold  doubling  of  the 
probability  that  the  attacker  ascribes  to  reality. 

Theorem  2.3.  Let  £  =  (S,bHlcrH,<jL).  Then: 

Q(£,  b'H)  —  k  =  b'H(aH)  =  2k-bH(aH). 

Proof.  In  appendix  2.  A.  □ 

Suppose  an  attacker  were  to  guess  what  reality  is  by  sampling  from  his  belief 
bH ;  the  probability  he  guesses  correctly  is  bH(aH).  Thus,  by  theorem  2.3,  one  bit 
of  leakage  makes  the  attacker  twice  as  likely  to  guess  correctly.  This  reveals 
an  interesting  analogy  with  the  uncertainty-based  definition.  In  it,  one  bit  of 
leakage  corresponds  to  the  attacker  becoming  twice  as  certain  about  the  high 
state,  though  he  may,  as  the  example  in  §2.3.1  shows,  become  certain  about  the 
wrong  high  state.  However,  one  bit  of  leakage  in  our  accuracy-based  definition 
corresponds  to  the  attacker  becoming  twice  as  certain  about  the  correct  high 
state. 


2.3.3  Accuracy,  Uncertainty,  and  Misinformation 

Accuracy  and  uncertainty  are  orthogonal  properties  of  beliefs,  as  depicted  in 
figure  2.3.  The  figure  shows  the  change  in  an  attacker's  accuracy  and  uncer¬ 
tainty  when  the  program 

FLIP  :  l:=  h  0.99Q  l  :=  -./i 
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II 

■  More  certain  I 

bH  =  (0.5, 0.5) 

bH  =  (0.5, 0.5) 

O  —  (Z  I — >  1) 

o  —  {l  i — >  0) 

Less  accurate 

More  accurate 

bH  =  (0.99,0.01) 

bH  =  (0.01,0.99) 

0  —  (l  1 — >  1) 

o  =  {l  t— >  0) 

HI 

■  Less  certain  A 

Figure  2.3:  Effect  of  FLIP  on  postbelief 


Table  2.3:  Analysis  of  FLIP 


Quadrant 

I 

II 

III 

IV 

bn{h  i— >  0) 

0.5 

0.5 

0.99 

0.01 

bH\h  ^  1) 

0.5 

0.5 

0.01 

0.99 

0 

(/  ^  0) 

(Z  (->•  1) 

(Z  (-»■  1) 

(Z  ^  0) 

b'H(h  i— ■>  0) 

0.99 

0.01 

0.5 

0.5 

b'Hlh  i->  1) 

0.01 

0.99 

0.5 

0.5 

Increase  in  accuracy 

+0.9855 

-5.6439 

-0.9855 

+5.6439 

Reduction  in  uncertainty 

+0.9192 

+0.9192 

-0.9192 

-0.9192 

is  analyzed  with  experiment  S  =  {FLIP,  bH,(h  >  0),  (Z  i— >•  0))  and  observation 
o  is  generated  by  the  experiment.  The  notation  bH  =  (x.  y)  in  figure  2.3  means 
that  bH{h  i— >  0)  =  x  and  bH(h  i— >  1)  =  y. 

Usually,  FLIP  sets  Z  to  be  h,  so  the  attacker  will  expect  this  to  be  the  case. 
Executions  in  which  this  occurs  will  cause  his  postbelief  to  be  more  accurate, 
but  may  cause  his  uncertainty  to  either  increase  or  decrease,  depending  on  his 
prebelief;  when  uncertainty  increases,  an  uncertainty  metric  would  mistakenly 
say  that  no  flow  has  occurred. 

With  probability  0.01,  FLIP  produces  an  execution  that  fools  the  attacker 
and  sets  Z  to  be  ->h,  causing  his  belief  to  become  less  accurate.  The  decrease  in 
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accuracy  results  in  misinformation,  which  is  a  negative  information  flow.  When 
the  attacker's  prebelief  is  almost  completely  accurate,  such  executions  will  make 
him  more  uncertain.  But  when  the  attacker's  prebelief  is  uniform,  executions 
that  result  in  misinformation  will  make  him  less  uncertain;  when  uncertainty 
decreases,  an  uncertainty  metric  would  mistakenly  say  that  flow  has  occurred. 

Table  2.3  concretely  demonstrates  the  orthogonality  of  accuracy  and  uncer¬ 
tainty.  The  quadrant  labels  refer  to  figure  2.3.  The  attacker's  prebelief  bHr  ob¬ 
servation  o,  and  resulting  postbelief  b'H  are  given  in  the  top  half  of  the  table.  In 
the  bottom  half  of  the  table,  increase  in  accuracy  is  calculated  using  information 
flow  metric  Q,  and  reduction  in  uncertainty  is  calculated  using  the  difference  in 
entropy  H(bH)  —  H(h'u).  The  symmetries  in  the  bottom  half  of  the  table  are  a 
result  of  the  symmetries  between  prebeliefs  and  postbeliefs.  Quadrants  II  and 
IV,  for  example,  have  exchanged  these  beliefs,  which  for  both  metrics  has  the 
effect  of  negating  the  amount  of  information  flow. 

The  probabilistic  choice  in  FLIP  is  essential  for  producing  misinformation, 
as  shown  by  the  following  theorem.  Let  Det  be  the  set  of  syntactically  deter¬ 
ministic  programs,  i.e.,  programs  that  do  not  contain  any  probabilistic  choice. 
Because  they  lack  a  source  of  randomness,  these  programs  cannot  decrease  the 
accuracy  of  an  attacker's  belief: 

Theorem  2.4.  S  e  Det  =>•  V£,  b'H  €  B(£)  .  Q(£,  b'H)  >  0. 

Proof.  In  appendix  2.  A.  □ 

2.3.4  Emulating  Uncertainty 

The  accuracy  metric  of  §2.3.1  generalizes  uncertainty  metrics.  Informally,  this  is 
because  uncertainty  metrics  recognize  only  two  distributions  (belief  before  and 
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after  execution),  whereas  our  framework  recognizes  these  plus  one  additional 
distribution  (reality).  By  ignoring  reality,  our  framework  can  produce  the  same 
results  as  many  uncertainty  metrics.  Here  we  show  how  to  emulate  the  metric 
of  Clark  et  al.  [25]. 

Let  A,  B,  and  C  be  random  variables.  The  conditional  mutual  information 
I(A,  B\C)  is  the  amount  of  uncertainty  about  the  value  of  A  that  is  resolved 
by  learning  the  value  of  B,  given  prior  knowledge  of  the  value  of  C  [32],  Con¬ 
ditional  mutual  information  is  defined  using  a  generalization  of  the  entropy 
function  from  §2.3.1  to  conditional  entropy  [32]: 


1{A,B\C)  =  H(A\C)-H(A\B,C) 

=  Pr  (a> b' c) lg 


Pr  (a,  b\c) 


Pr  (a|c)  •  Pr  (b\c) 

In  this  definition,  a  abbreviates  A  =  a,  etc.  The  probability  is  taken  with  respect 
to  the  joint  distribution  on  A,  B,  and  C. 

The  metric  of  Clark  et  al.  states  that  the  amount  of  information  flow  C  from 
high  input  Hm  into  low  output  Lout,  given  low  input  Lmf  is  the  mutual  infor¬ 
mation  between  Hm  and  Lout,  given  Lin: 


BiyH in i  L ini  L out)  ,  Lout  | L *„) . 


First,  to  instantiate  our  framework  to  that  of  Clark  et  al.,  we  force  our  frame¬ 
work  to  ignore  reality  by  introducing  an  admissibility  restriction  (c.f.  §2.1.4): 
prebeliefs  must  be  identical  to  the  system's  chosen  high  input  distribution.  This 
means  that  prebeliefs  must  be  correct;  there  can  be  no  error  in  the  attacker's 
estimate  of  the  probability  distribution  on  high  inputs. 

Second,  we  adjust  the  definition  of  belief.  The  uncertainty  model  of  Clark  et 

al.  calculates  information  flow  as  an  expectation  over  a  probability  distribution 

5Their  metric  more  generally  allows  the  quantification  of  information  flow  into  any  subset 
of  the  output  variables.  The  approach  we  give  here  can  similarly  be  generalized. 
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on  both  low  and  high  inputs.  We  could  model  this  using  the  techniques  about 
to  be  introduced  in  §2.3.5  and  §2.3.6,  but  because  of  the  admissibility  restriction 
just  made,  it  is  equivalent  and  simpler  to  allow  beliefs  to  range  over  low  state 
as  well  as  high  state.  As  before,  we  assume  that  high  state  remains  constant 
using  the  copying  technique  of  §2.2.4.  Since  beliefs  now  include  low  state,  we 
must  also  apply  this  technique  to  assure  that  the  initial  values  of  low  variables 
are  preserved  in  the  state.  Let  the  low  input  component  of  the  state  be  denoted 
L°.  Assume  that  the  attacker's  prebelief  b  ranges  over  L°  U  H°,  whereas  his 
postbelief  b'  ranges  over  L°  U  H°  U  L  U  H . 

We  want  to  establish  that  accuracy  metric  Q  yields  the  same  result  as  uncer¬ 
tainty  metric  £  for  any  outcome.  Recall  that  Q  is  defined  in  terms  of  distance 
function  D.  Our  previous  instantiation  of  D  as  relative  entropy  yielded  an  ac¬ 
curacy  metric.  Now  we  reinstantiate  D  using  (non-relative)  entropy: 

D(b  — o  b')  =  H(b\(LUL°UH°))-H{b\(LUL0)). 

Observe  that  this  instantiation  ignores  argument  b' ,  the  belief  representing  real¬ 
ity.  Let  Hm  =  o H/  Lin  =  ctl,  and  Lout  =  5'  \L,  where  S'  is  the  output  distribution 
from  the  experiment  protocol.  This  yields  that  amount  of  information  flow  Q  is 
the  same  as  uncertainty  metric  £: 

Theorem  2.5.  Q{£,  b')  =  C(Hin,  Lin,  Lout). 

Proof.  In  appendix  2.  A.  □ 

We  discuss  another  relationship  between  accuracy  and  uncertainty  in  §2.3.6. 
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2.3.5  Expected  Flow  for  an  Experiment 

Since  an  experiment  on  a  probabilistic  program  can  produce  many  observations, 
and  therefore  many  outcomes,  it  is  desirable  to  characterize  expected  flow  over 
those  outcomes.  So  we  define  expected  flow  Qe  over  all  observations  from  ex¬ 
periment  £: 

Se(£)  4  E -x„l[Q(C,B(£,o))] 

=  (E  o  :  (S'  [£)(») '  <2(£,  ([S](ii  ®  Sh)\o)  f  H)) 

where  S'  \  L  —  [5](<rx,  (8)  a H)  \  L  is  the  distribution  on  observations;  E„«Pf(</)] 
is  the  expected  value  of  expression  X,  which  has  free  variable  y,  with  respect  to 
distribution  5;  and  B  is  the  belief  revision  operator  from  §2.2.3. 

Expected  flow  is  useful  in  analyzing  probabilistic  programs.  Consider  a 
faulty  password  checker: 

FPWC  :  Up  =  g  then  a  :=  1  else  a  :=  0; 

a  :=  ->a  0.i  0  skip 

With  probability  0.1,  FPWC  inverts  the  authentication  flag.  Can  this  program 
be  expected  to  confound  attackers — does  FPWC  leak  less  expected  information 
than  PWC ?  This  question  can  be  answered  by  comparing  the  expected  flow 
from  FPWC  to  the  flow  of  PWC.  Table  2.4  gives  information  flows  from  FPWC 
for  experiments  £[  and  £f ,  which  are  identical  to  £\  and  £-2  from  §2.3.1,  except 
that  they  execute  FPWC  instead  of  PWC.  Observations  ( a  0)  and  (a  i—>  1) 
correspond  to  an  execution  where  the  value  of  a  is  inverted.  The  flow  for  the 
outcomes  resulting  from  these  observations  is  negative,  indicating  that  the  pro¬ 
gram  is  giving  the  attacker  misinformation.  Note  that,  for  both  pairs  of  experi¬ 
ments  in  table  2.4,  the  expected  flow  of  FPWC  is  less  than  the  flow  of  PWC.  We 
have  confirmed  that  the  random  corruption  of  a  makes  it  more  difficult  for  the 
attacker  to  increase  the  accuracy  of  his  belief. 
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Table  2.4:  Leakage  of  PWC  and  FPWC 


£ 

0 

Q(£,B(£,o)) 

Qe(£) 

£i 

(a  1— >  1) 
(a  1— ■>  0) 

0.0291 

impossible 

0.0291 

£f 

(a  1— >  1) 
(a  1— »•  0) 

0.0258 

-0.2142 

0.0018 

£2 

(a  1— ■>  1) 
(a  1— >  0) 

impossible 

5.6439 

5.6439 

£F 

c2 

(a  1— >  1) 
(a  1— »•  0) 

-3.1844 

2.9561 

2.3421 

Expected  flow  can  be  conservatively  approximated  by  conditioning  on  a  sin¬ 
gle  distribution  rather  than  conditioning  on  many  observations.  Conditioning 
8  on  8l  has  the  effect  of  making  the  low  projection  of  8  identical  to  8L/  while 
leaving  the  high  projection  of  <5| aL  unchanged  for  all  ar:. 


\<7  . 


j(g) 

(S\L)(a\L) 


■SL(a\L). 


A  bound  on  expected  flow  is  then  calculated  as  follows.  Given  experiment 
£  =  ( S ,  bn,  aH,  aL),  let  5'  be  the  distribution  that  results  from  the  system  ex¬ 
ecuting  S  as  in  step  4  of  the  experiment  protocol,  i.e.,  8'  =  f,S'](VA  (8)  crH).  In 
the  experiment  protocol,  an  attacker  would  observe  the  low  projection  of  a 
state  from  5'.  But  suppose  that  the  attacker  instead  observed  the  low  projec¬ 
tion  of  8’  itself.  (This  projection  is  the  distribution  over  observations  that  the 
attacker  would  approach  if  he  continued  to  repeat  £.)  Let  en  be  the  postbelief 
that  results  from  conditioning  on  this  distribution,  as  in  step  5  of  the  protocol: 
eH  =  m(aL  8)  bH))\(8'  \  L))  (  H.  Intuitively,  eH  is  the  attacker's  expected 
postbelief  with  respect  to  8'  \  L.  The  amount  of  information  flow  from  expected 
postbelief  e#  then  bounds  the  expected  amount  of  information  flow: 
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Theorem  2.6.  Let: 


£  =  (S,bH,aH,o-L), 

5'  =  [S']  (Ac  ®  aH), 
eH  =  ((mvL®bHM6'\L))\H. 

Then: 

Qe(£)  <  Q(£,eH). 


Proof.  In  appendix  2. A. 


□ 


As  an  example,  consider  experiment  Sf .  Calculating  the  attacker's  expected 
postbelief  en  in  this  experiment  yields  eH  =  (0.8601,0.0699,0.0699),  using  the 
postbelief  notation  from  §2.3.3.  Bound  Q(£,  en)  from  theorem  2.6  is  thus  6.4264 
bits,  which  is  indeed  greater  than  expected  flow  QE  as  calculated  in  table  2.4. 


2.3.6  Expected  Flow  over  All  Experiments 

Uncertainty-based  metrics  typically  consider  the  expected  information  flow 
over  all  experiments,  rather  than  the  flow  in  a  single  experiment.  An  analy¬ 
sis,  like  ours,  based  on  single  experiments  allows  a  more  expressive  language 
of  security  properties  in  which  particular  inputs  or  experiments  can  be  consid¬ 
ered.  Moreover,  our  analysis  can  be  extended  to  calculate  expected  flow  over  all 
experiments. 

Rather  than  choosing  particular  high  input  states  aH,  the  system  may  choose 
distribution  5h  over  high  states.  A  distribution  over  high  inputs  could  be  used, 
for  example,  to  determine  the  expected  flow  of  the  password  checker  when 
users'  choice  of  passwords  can  be  described  by  a  distribution.  Distribution  5h 
is  sampled  to  produce  the  initial  high  input  state.  Taking  the  expectation  in  QE 
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with  respect  to  both  <jH  and  o  then  yields  the  expected  flow  over  all  experiments 
for  a  given  low  input  aL. 

The  expected  flow  over  all  experiments  can  be  characterized  using  condi¬ 
tional  mutual  information  (c.f.  §2.3.4).  Let  Hin  denote  the  distribution  over  high 
inputs,  Lin  over  low  inputs,  and  Lout  over  low  outputs.  For  an  experiment 
E  =  (S.  bn,  Sh ,  erf),  distribution  Hm  is  5h,  distribution  L,n  is  oL,  and  distribution 
Lout  is  S'  |  L,  where  5'  is  the  output  distribution  from  the  experiment  protocol.  If 
system  distribution  SH  is  identical  to  attacker  prebelief  bH  (i.e.,  there  is  no  error 
in  the  attacker's  estimate  of  the  probability  distribution  on  high  inputs),  the  ex¬ 
pected  flow  over  all  experiments  for  a  given  low  input  is  equal  to  the  conditional 
mutual  information  between  Hin  and  Lout  given  Lm  : 

Theorem  2.7.  Let  £  =  (S,  bH,  SHl  erf),  where  bH  =  SH.  Then : 

Qe(£)  =  Lout\Lin). 


Proof.  In  appendix  2.  A.  □ 

This  theorem  means  that  our  metric  for  expected  information  flow  agrees 
with  uncertainty  metrics  (such  as  Clark  et  al.  [24])  if  attackers  have  beliefs  that 
do  not  differ  from  reality — that  is,  if  the  attacker's  belief  is  equal  to  the  system's 
distribution  on  high  inputs.  This  requirement  is  unsurprising,  because  uncer¬ 
tainty  metrics  do  not  distinguish  between  beliefs  and  reality. 

The  attacker  may  also  choose  distribution  SL  over  low  states.  This  extension 
increases  the  expressive  power  of  the  experiment  model — for  example,  the  at¬ 
tacker  can  use  5L  to  express  a  randomized  guessing  strategy.  His  distribution 
might  also  be  a  function  of  his  belief;  we  do  not  address  such  attacker  strategies 
here. 
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2.3.7  Maximum  Information  Flow 


System  designers  are  likely  to  want  to  limit  the  maximum  possible  information 
flow.  We  characterize  the  maximum  amount  of  information  flow  that  program 
S  can  cause  in  a  single  outcome  as  the  maximum  amount  of  flow  from  any 
outcome  of  any  experiment  £  =  (S,  bH ,  aH,  aL)  on  S: 

QrnUS)  =  ma x{Q(£,  b'H)  \  £,  b'H  G  B{£)}. 

Consider  applying  Qmax  to  PWC.  Assume  that  bH  is  a  uniform  distribution, 
representing  a  lack  of  belief  for  any  particular  password,  over  k- bit  passwords. 
If  the  attacker  guesses  correctly,  the  maximum  leakage  is  k  bits  according  to 
Qmax.  But  if  the  attacker  guesses  incorrectly,  PWC  can  leak  at  most  k  —  lg(2fc  —  1) 
bits  in  an  outcome;  for  k  >  12  this  is  less  than  0.0001  bits. 

Uncertainty  metrics  typically  declare  that  the  maximum  possible  informa¬ 
tion  flow  is  lg  |  State#  | ;  this  is  the  number  of  bits  necessary  to  store  the  high 
state.  This  was  true  for  the  example  of  A-bit  passwords  above.  However,  as 
experiment  So  from  §2.3.1  shows,  this  declaration  is  valid  only  if  the  attacker's 
prebelief  is  no  more  inaccurate  than  the  uniform  distribution.  Thus  uncertainty 
metrics  make  an  implicit  restriction  on  attacker  beliefs  that  our  accuracy  metric 
does  not. 

2.3.8  Repeated  Experiments 

Nothing  precludes  performing  a  series  of  experiments.  The  most  interesting 
case  has  the  attacker  return  to  step  2b  of  the  experiment  protocol  in  figure  2.2 
after  updating  his  belief  in  step  5 — that  is,  the  system  keeps  the  high  input  to  the 
program  constant,  and  the  attacker  is  allowed  to  check  new  low  inputs  based 
on  the  results  of  previous  experiments. 
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Table  2.5:  Repeated  experiments  on  PWC 


Repetil 

1 

:ion 

2 

bH:  A 

0.98 

0 

B 

0.01 

0.5 

C 

0.01 

0.5 

°L(g) 

A 

B 

°(a) 

0 

0 

btH.  ^ 

0 

0 

B 

0.5 

0 

C 

0.5 

1 

Q(£,b>H) 

5.6439 

1.0 

Suppose  that  experiment  £2  from  §2.3.1  is  conducted  and  repeated  with 
aL  =  (g  B).  Then  the  attacker's  belief  about  the  password  evolves  as  shown 
in  table  2.5.  Summing  the  information  flow  for  each  experiment  yields  a  total 
information  flow  of  6.6439.  This  total  corresponds  to  what  Q  would  calculate 
for  a  single  experiment,  if  that  experiment  changed  prebelief  bH  to  postbelief 
b'H 2,  where  b'H1  is  the  attacker's  final  postbelief  in  table  2.5: 

D{bu  — >  <7h)  —  D(b'H 2  —>  <Jh )  =  6.6439  —  0 

=  6.6439 

This  example  suggests  that,  given  a  series  of  experiments  in  which  the  post¬ 
belief  from  one  experiment  becomes  the  prebelief  to  the  next,  the  final  postbelief 
contains  all  the  information  learned  during  the  series.  Let  £%  =  (S,  b H. ,  aH,  crL.) 
be  the  ith  experiment  in  the  series,  and  let  b'H.  be  the  outcome  from  £t.  Let  pre¬ 
belief  bH.  in  experiment  £,  be  chosen  as  postbelief  b'H  from  experiment  £,-\. 
Let  bHl  be  the  attacker's  prebelief  for  the  entire  series.  Let  n  be  the  length  of  the 
series.  The  following  theorem  states  that  the  final  postbelief  does  contain  all  the 
information: 
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o-tf)  =  (E  i  :  1  <  *  <  n  :  Q(£i,b'H.)). 


Theorem  2.8.  D(bH]  ->  aH)  —  D(p'Hn  -> 

Proof.  Immediate  by  the  definition  of  Q  and  arithmetic.  □ 

Consequently,  our  experiment  model  enables  compositional  reasoning  about 
series  of  attacks. 

2.3.9  Number  of  Experiments 

Attackers  conduct  experiments  to  refine  their  beliefs.  This  suggests  another 
quantification  of  the  security  of  a  program:  the  number  of  experiments  required 
for  an  attacker  to  refine  his  belief  to  within  some  distance  of  reality.  For  sim¬ 
plicity,  assume  that  program  S  is  deterministic,6  such  that  only  one  observation 
is  possible  from  an  experiment.  Then  belief  revision  B  (from  §2.2.3)  can  be  used 
as  a  function  from  experiments  to  postbeliefs.  Let  A  :  Belief  — >  State ^  be  the 
attacker's  strategy  for  choosing  low  inputs  based  on  his  beliefs.  Define  the  ith 
iteration  of  B  as  B1: 

BfS,  bH,aH,A)  4  B{S,VH,aH,A{VH)\ 

where  b'H  =  Bl~l(S,  bH,  aHlA ); 

B1(S,  bH,aH,A)  ±  B(S,bHAH,A(bH)). 

Then  the  number  of  experiments  A f  needed  to  achieve  a  postbelief  within  dis¬ 
tance  e  of  reality  is: 

A f{S,  bH,  aH,  A)  =  min{i  |  D(Bl(S,  bH,  crHlA )  -»  aH)  <  e}. 

As  discussed  in  §2.3.2,  when  an  attacker's  belief  is  k  bits  distant  from  re¬ 
ality,  the  probability  he  ascribes  to  the  correct  high  state  is  l/2fc.  If  the  attacker 

6If  program  S  is  probabilistic,  B(£)  could  instead  be  defined  as  a  random  variable  giving  the 
probability  with  which  the  attacker  holds  a  postbelief.  This  would  allow  the  definition  of  the 
expected  number  of  experiments  to  achieve  a  distance  from  reality. 
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[skip]  a  =  a 
[v  :=  E\a  =  a[v  i— >•  E\ 

[Si;s2)ff  =  [SJ'dS,]^ 

[if  B  then  else  *S'2]cr  =  if  \B\cr  then  [5'ijcr  else  [S^cr 
[while  B  do  SJ  =  fix(Ad  :  State  — >  Dist . 

Act  .  if  [ Bja  then  d*([S']cr)  else  a) 

[Si  pD  S2jo-  =  V  [<S'i]o-  +  (1  ~p)  ■  [^Jo- 

Figure  2.4:  State  semantics  of  programs 

were  to  guess  a  high  state  by  sampling  from  his  belief,  he  would  therefore  guess 
correctly  with  probability  l/2e  after  J\f  experiments. 

Sometimes  an  attacker  needs  only  to  reach  a  belief  that  is  close  to  reality.  For 
example,  if  the  high  state  is  a  Cartesian  coordinate,  the  attacker  might  need  only 
to  bound  the  coordinate  within  some  Cartesian  distance.  Let  ballon)  be  all  the 
high  states  within  distance  7  of  aH  according  to  a  distance  metric  M  on  State//: 

ball  (a  H )  =  Wh  I  an)  <  7}- 

Then  the  number  of  experiments  needed  to  achieve  some  distance  e  from  some 
ball  7  around  reality  is: 

Af(S,bH,<7H,A )  =  min{i  |  a'H  G  ball  (a  H)  A  D(Bl(S,  bH,  crH,  A)  ->  <j'h)  <  e}. 

2.4  Language  Semantics 

The  last  technical  piece  we  require  is  a  semantics  [5]  in  which  programs  de¬ 
note  functions  that  map  distributions  to  distributions.  Here  we  build  such  a 
semantics  in  two  stages.  First,  we  build  a  simpler  semantics  that  maps  states  to 
distributions.  Second,  we  lift  that  semantics  so  that  it  operates  on  distributions. 

Our  first  task  then  is  to  define  the  semantics  [5]  :  State  — >  Dist.  That  seman¬ 
tics  is  given  in  figure  2.4.  We  assume  a  semantics  [C]  :  State  — >  Val  that  gives 
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meaning  to  expressions,  and  a  semantics  [5]  :  State  — >  Bool  that  gives  meaning 
to  Boolean  expressions. 

The  statements  skip  and  if  have  essentially  the  same  denotations  as  in  the 
standard  deterministic  case.  State  update  a[v  i— >  V],  where  V  G  Val,  changes 
the  value  of  v  to  V  in  a.  The  distribution  update  8[v  i— >  E\  in  the  denotation  of 
assignment  represents  the  result  of  substituting  the  meaning  of  E  for  v  in  all  the 
states  of  8: 


8[v  i E]  =  Xu .  ((X)  a'  :  a'[v  i— >•  [TJjcr']  =  a  :  <5(V))). 

The  semantics  of  while  and  sequential  composition  Si,  S2  use  lifting  operator  *, 
which  lifts  function  d  :  State  — >  Dist  to  function  d*  :  Dist  — >  Dist,  as  suggested 
by  §2.1.2: 

d*  =  XS  .  (XI  o'  :  S(a)  ■  d(a)) 

=  X5  .  Xcr .  (X]  d  :  d(cr')  ■  d(a')(a)), 

where  the  equality  follows  from  //-reduction,  and  •  and  +  are  used  as  pointwise 
operators: 

p  ■  5  =  Xa.p-5(a), 

Si  +  52  =  Xa .  8i(a)  +  52(a) . 

Lifted  d*  is  thus  the  expected  value  (which  is  a  distribution)  of  d  with  respect  to 
distribution  5. 

To  ensure  that  the  fixed  point  for  while  exists,  we  must  verify  that  Dist  is 
a  complete  partial  order  with  a  bottom  element  and  that  [•]  is  continuous.  We 
omit  the  proof  here,  as  it  is  a  consequence  of  a  theorem  proved  by  Kozen  [66]. 
But  we  note  that  a  key  step  is  to  strengthen  the  definition  of  Dist  from  §2.1.1  to 
be  {<5  |  8  G  State  — >  [0,1]  A  ||5||  <  1}.  This  makes  distributions  correspond 
to  subprobability  measures,  and  it  is  easy  to  check  that  the  semantics  produces 
subprobability  measures  as  output.  The  bottom  element  is  then  A o  .  0,  and  the 
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ordering  relation  on  distributions  is  pointwise.  Note  that  the  definition  of  Belief 
from  §2.1.4  remains  unchanged,  since  it  did  not  depend  on  Dist.  Thus  beliefs 
still  correspond  to  probability  measures.  Anywhere  that  the  result  of  the  pro¬ 
gram  semantics  must  be  upgraded  to  a  belief  (i.e.,  from  a  subprobability  to  a 
probability),  we  rely  on  the  technique  of  §2.2.4  to  handle  nontermination.  The 
most  important  occurrence  of  this  is  in  step  5  of  the  experiment  protocol  in  fig¬ 
ure  2.2. 

The  final  program  construct  is  probabilistic  choice.  Si  p  []  S2,  where  0  < 
p  <  1.  The  semantics  multiplies  the  probability  of  choosing  a  side  S,  with  the 
frequency  that  Si  produces  a  particular  output  state  o'.  Since  the  same  state 
o'  might  actually  be  produced  by  both  sides  of  the  choice,  the  frequency  of  its 
occurrence  is  the  sum  of  the  frequency  from  either  side:  p  ■  ([S'i]cr)(cr/)  +  (1  —  p)  ■ 
([S,2](t)((t/),  which  can  be  simplified  to  the  formula  in  figure  2.4. 

To  lift  the  semantics  in  figure  2.4  and  define  [5]  :  Dist  — >  Dist,  we  again 
employ  lifting  operator  *: 

[SF  =  [S]*« 

=  amE  ■  (ISK)M). 

Interpreting  this  definition,  note  there  are  many  states  o'  in  which  S  could  begin 
execution,  and  all  of  them  could  potentially  terminate  in  state  o.  So  to  compute 
([S]  5)  (cr),  we  take  a  weighted  average  over  all  input  states  o'.  The  weights  are 
6 (o'),  which  describes  how  likely  o’  is  to  be  used  as  the  input  state.  With  o'  as 
input,  S  terminates  in  state  o  with  frequency  ([S']cr/)(cr). 

Applying  this  definition  to  the  semantics  in  figure  2.4  yields  [S'] 8,  shown 
in  figure  2.5.  This  lifted  semantics  corresponds  directly  to  a  semantics  given 
by  Kozen  [66],  which  interprets  programs  as  continuous  linear  operators  on 
probability  measures.  Our  semantics  uses  an  extension  of  the  distribution  con- 
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[skip]  (5 

[i>  :=  E]5 
[Si;  SJi 

|if  B  then  Si  else  S2]d 
[while  B  do  SJ 

[Si  P^s2]s 


5 

S[v  i — *  E\ 

[S2KIS1M) 

[S,](i|B)  +  [52](5hB) 

flx(A d  :  Dist  ->  Dist .  A<5 .  d([S](<5 1  B))  +  (5  |  -i£f)) 
[Si]P'«+IS2](l-p)-i 


Figure  2.5:  Distribution  semantics  of  programs 


ditioning  operator  j  to  Boolean  expressions.  Whereas  distribution  conditioning 
produces  a  normalized  distribution.  Boolean  expression  conditioning  produces 
an  unnormalized  distribution: 


5\B  =  Act  .  if  [5] a  then  5 (a)  else  0. 

By  producing  unnormalized  distributions  as  part  of  the  meaning  of  if  and  while 
statements,  we  track  the  frequency  with  which  each  branch  of  the  statement  is 
chosen. 


2.5  Insider  Choice 

The  experiment  protocol  in  §2.2  involved  two  agents,  the  attacker  and  the  sys¬ 
tem.  Consider  a  third  agent  called  the  insider,  whose  goal  is  to  help  the  attacker 
learn  secret  information.  The  insider  and  attacker  might  initially  communicate 
to  establish  a  strategy  to  achieve  this  goal.  Once  execution  begins,  the  insider 
cannot  directly  communicate  with  the  attacker,  but  the  insider  can  observe  the 
entire  program  state  and  can  influence  execution. 

The  insider's  ability  to  influence  execution  is  modeled  by  a  new  program¬ 
ming  language  construct,  insider  choice,  denoted  ,5' ,  []  S2: 

S  ::=  ...  T,  [  S-> 
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The  insider,  rather  than  the  system,  is  the  entity  who  executes  this  kind  of 
choice.  The  insider  chooses  either  Si  or  S2  and  execution  continues  with  the 
chosen  program. 

As  an  example  of  insider  choice,  consider  program  LI : 

LI  :  h  :=  h  mod  2; 

l  :=  0  D  Z  :=  1 

The  second  line  of  LI  allows  the  insider  to  choose  between  two  values  for  vari¬ 
able  /.  Since  the  insider  is  allowed  to  observe  the  high  component  of  the  state, 
he  can  observe  the  parity  of  h  and  choose  to  set  l  equal  to  it,  thus  leaking  the 
parity  of  h. 

The  insider  in  this  example  made  a  deterministic  choice.  More  generally, 
insiders  may  also  make  probabilistic  choices.  For  example,  an  insider  could  flip 
a  fair  coin  then  choose  the  left  side  on  heads  or  the  right  side  on  tails.  This 
can  be  seen  as  an  extension  of  probabilistic  choice,  in  which  the  probability  is 
a  function  of  the  program  state  rather  than  just  a  constant.  Thus  insider  choice 
can  model  the  behavior  of  probabilistic  programs  that  are  not  influenced  by  an 
insider. 

2.5.1  Insider  Functions 

Formally,  an  insider  is  a  function  /  e  Insider,  where 

Insider  =  State  — >  [0..1]. 

1(a)  is  the  probability  with  which  the  left-hand  side  of  the  insider  choice  is 
taken.  For  example,  insider  function  In  leaks  the  value  of  h  in  program  LI 
with  probability  0.99: 

Ili(c r)  =  if  a(h)  =  0  then  0.99  else  0.01 
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In  a  program  with  multiple  syntactic  occurrences  of  insider  choice,  a  single  in¬ 
sider  function  can  encode  different  probabilities  for  each  occurrence  if  the  pro¬ 
gram  state  encodes  the  program  counter. 

Moreover,  if  the  program  state  is  sufficiently  rich,  insider  functions  can 
model  a  range  of  insider  capabilities.  For  example,  suppose  the  operational 
semantics  guarantees  that  for  every  variable  x,  the  previous  value  of  x  (i.e.,  the 
value  that  was  assigned  to  it  before  its  current  value  was  assigned)  is  preserved 
in  variable  x.  Then  insider  functions  can  make  decisions  based  on  past  state  by 
reading  those  previous  values.7  In  the  following  program,  the  insider  leaks  the 
initial  parity  of  h: 

LP  :  h  :=  h  mod  2; 
h  :=  0; 

l  0  []  l  1 

The  insider  function  that  accomplishes  this  is 

Ilp{v)  =  if  cr(fi)  =  0  then  1  else  0. 

Note  that  without  access  to  variable  h,  the  insider  is  unable  to  leak  the  initial 
parity  of  h  because  this  information  is  removed  from  the  state  when  h  is  as¬ 
signed  the  value  0. 

Insiders  with  limited  computational  resources  can  be  modeled  by  further 
restricting  Insider.  For  example,  suppose  that  insiders  are  allowed  only  poly¬ 
nomial  time  to  make  a  choice.  Then  insider  functions  could  be  replaced  by 
polynomially  time-bounded  Turing  machines,  where  the  input  to  the  machine 
is  the  input  a  to  the  insider  function,  and  the  output  of  the  machine  is  used  as 
the  output  of  the  insider  function. 

7This  mechanism  is  similar  to  history  variables  [1].  Likewise,  insiders  who  can  predict  the 
future  values  of  variables  could  be  modeled  by  a  mechanism  similar  to  prophecy  variables  [1]. 
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[skip] /o' 
[u  \=  Eh  a 
[Si]S2]kt 

[if  B  then  else  S2]icr 
[while  B  do  Sji 

[Si  S2jTa 
[Si  D  S2ha 


a[v  i— >•  E\ 

(ISihYdSih*) 

if  \B]a  then  [Sj/cr  else  [Sy/cr 

fix  (Ad  :  State  — ■>  Dist . 

\a .  if  [B\a  then  d*([S,]/0')  else  a) 
V  ■  [<S'i]/<t+  (1  -p)  ■  [S2]i<t 
I(a)  '  [Sijicr  +  (1  -  1(a))  ■  {S2jia 


Figure  2.6:  State  semantics  of  programs  with  insider 


[skip]  7  5 
[»  •=  EhS 
[Si;  S2]„5 

[if  B  then  Si  else  S2]id 
[while  B  do  S}j 

[Si  pD  s2]ts 
[Si  0  s2hs 


5 

<5[v  i — >  E] 

[S2],([Si],*) 

[S,]/(<S  I  B)  +  [SJ,(i  |  ->P) 
fix(Ad  :  Dist  — >•  Dist . 

M.d(IS],(i|B))  +  (ihB)) 
[Si]/  p-5  +  [SJ,jl  -p)-S 
[Si]//ffl  +  [S2],/(<S) 


Figure  2.7:  Distribution  semantics  of  programs  with  insider 


2.5.2  Semantics  and  Experiments 


Formal  semantics  [S']  :  Insider  — >  State  — >  Dist  is  given  in  figure  2.6.  The  only 
place  in  the  semantics  that  the  insider  function  is  used  is  in  the  semantics  of 
Si  []  S2,  and  the  semantics  never  modifies  the  insider  function.  Because  of  this 
second-class  nature  of  insider  functions,  and  for  improved  readability,  we  use 
a  subscript  notation  for  the  insider  function  /  in  semantics  [S]/.  We  can  lift  the 
semantics  to  operate  on  distributions  as  shown  in  figure  2.7.  The  lifted  insider 
function  is  defined  as  follows: 

1(5)  =  A  a.  1(a) -5(a), 

7(5)  =  Acr .  (1  —  1(a))  ■  5(a). 

The  experiment  protocol  in  §2.2.1  can  be  extended  to  include  insiders,  as 
shown  in  figure  2.8.  Note  that  the  attacker  uses  insider  function  /  when  con- 
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An  experiment  S  =  ( S ,  bH,  <?h,  &l,  I)  is  conducted  as  follows. 

1.  The  attacker  chooses  a  prebelief  bH  about  the  high  state. 

2.  (a)  The  system  picks  a  high  state  an- 
(b)  The  attacker  picks  a  low  state  aL. 

3.  The  attacker  predicts  the  output  distribution:  S'A  =  lS]i(aL  ®  bH)- 

4.  The  system  and  insider  execute  the  program  S,  which  produces  a  state 

o’  e  5’  as  output,  where  5’  =  <E>  cth)-  The  attacker  observes  the  low 

projection  of  the  output  state:  o  —  a'  \  L. 

5.  The  attacker  infers  a  postbelief:  b'H  =  (<s»  r  h. 

Figure  2.8:  Experiment  protocol  with  insider 

ducting  the  thought-experiment.  This  function  thus  encodes  choices  that  the 
insider  and  attacker  have  agreed  upon  in  advance. 


2.5.3  Security  Conditions 

Observational  determinism  [85,102,130]  is  a  security  condition  for  nondeterminis- 
tic  systems  that  generalizes  noninterference  [46].  We  can  state  a  probabilistic 
generalization  of  observational  determinism  that  is  applicable  to  our  insider 
model:  a  program  S  satisfies  observational  determinism  exactly  when  S  be¬ 
haves  as  a  function  from  a  low  input  state  to  a  low  output  distribution,  for  any 
insider  and  high  input.  Let  the  set  of  programs  satisfying  observational  deter¬ 
minism  be  denoted  ObsDet,  which  is  defined  as  follows: 

ObsDet  =  {S  \  \/I  .\/crL  .36l  .\/<jh  ■IS]i{&l<8>&h)\L  =  SL}- 

Observational  determinism  is  equivalent  to  zero  information  flow  in  the  in¬ 
sider  model — that  is,  a  program  S  satisfies  observational  determinism  exactly 
when  all  experiments  over  S  leak  exactly  0  bits  of  information: 
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Theorem  2.9.  S  E  ObsDet 


=  V£,VHeB(£).Q{£,VH)  =  0. 


Proof.  In  appendix  2. A.  □ 

Theorem  2.9  suggests  that  observational  determinism  is  the  absolute  security 
condition  for  nondeterministic  systems.  On  the  other  hand,  the  theorem  also 
shows  that  observational  determinism  is  too  strong  to  be  useful  with  programs 
that  require  information  flow,  such  as  PWC. 

Other  nondeterministic  security  conditions,  such  as  generalized  noninterfer¬ 
ence  (GNI)  [81],  are  already  known  to  allow  leakage  of  information  [119].  Our 
model  of  insider  choice  allows  this  leakage  to  be  quantified:  a  program  S  sat¬ 
isfies  GNI  when  S  behaves  as  a  relation  on  a  low  input  state  and  low  output 
distributions,  for  any  insider  and  high  input: 

GNI  4  {5  |  WaL  .  3Al  .  VaH  .  ®  &h)  \L)  =  AL}. 

i 

Consider  program  LH,  which  can  be  shown  to  be  in  GNI: 

LH  :  l:=  h  [](/:=  0  fl  l  ■=  1) 

Using  insider  function  Ilh(c)  =  1,  this  program  always  leaks  the  value  of  h. 
Unless  the  attacker  already  has  a  perfectly  accurate  belief  about  h,  this  is  a  pos¬ 
itive  (and  non-zero)  amount  of  leakage.  So  even  though  the  program  is  secure 
according  to  GNI,  an  insider  can  refine  the  program  to  be  insecure.  This  weak¬ 
ness  is  known  as  the  refinement  paradox  [102],  Insiders  therefore  introduce  a  kind 
of  nondeterminism  that  is  not  secure  under  refinement. 
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2.6  Related  Work 


Quantification  of  information  flow.  The  first  published  connection  between 
information  theory  and  information  flow  is  by  Denning  [35],  who  uses  entropy 
to  calculate  the  leakage  of  a  few  assignment  and  conditional  statements. 

Backes,  Kopf,  and  Rybalchenko  [11]  construct  an  automated  static  analysis 
for  computing  the  quantity  of  information  flow  in  simple  imperative  programs. 
Their  analysis  assumes  a  uniform  distribution  on  high  inputs,  computes  a  high 
equivalence  relation  on  low  observable  outputs,  then  counts  the  number  of  high 
inputs  in  each  equivalence  class.  This  count  yields  a  probability  distribution  that 
can  be  used  to  compute  several  entropy-based  metrics  of  information  flow. 

Smith  [108]  argues  that  the  function  used  to  quantify  uncertainty  should  de¬ 
pend  on  the  attack  model.  For  some  programs,  the  expectation  taken  as  part  of 
the  formula  for  mutual  information  masks  the  fact  that  certain  executions  leak 
a  large  amount  of  information,  thus  making  it  easy  for  the  attacker  to  guess  the 
remaining  secret  information.  Our  framework  in  part  addresses  this  problem 
by  allowing  quantification  of  information  flow  both  for  single  experiments  and 
in  expectation  over  all  experiments. 

McCamant  and  Ernst  [80]  implement  an  automated  hybrid  analysis  for 
quantification  of  information  flow  in  Linux/x86  binaries.  Their  analysis  com¬ 
putes  a  conservative  upper  bound  on  the  amount  of  information  that  can  be 
leaked  by  the  particular  execution  the  dynamic  part  of  the  analysis  observes. 
But  the  analysis  cannot  bound  the  quantity  of  flow  for  executions  it  does  not 
observe.  The  quantity  measured  by  the  analysis  is  an  upper  bound  on  channel 
capacity,  which  is  the  maximum  amount,  over  any  probability  distribution  on 
inputs,  of  mutual  information  between  secret  inputs  and  public  outputs. 
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Kopf  and  Basin  [65]  quantify  the  resistance  of  a  deterministic  system  against 
sequences  of  attacks,  where  resistance  is  a  function  from  the  number  of  attacks 
performed  to  the  expected  remaining  uncertainty  of  the  attacker.  Their  defini¬ 
tions  can  quantify  uncertainty  with  several  variants  of  entropy.  They  give  an 
automated,  heuristic  analysis  that  approximates  resistance. 

Clark,  Hunt,  and  Malacaria  [24]  develop  a  static  analysis  that  bounds  the 
amount  of  information  leaked  by  a  while-program.  Their  metric  for  informa¬ 
tion  leakage  is  based  on  conditional  entropy.  The  analysis  comprises  a  dataflow 
analysis,  which  computes  a  use-def  graph,  and  syntax-directed  inference  rules, 
which  calculate  leakage  bounds.  These  authors  also  investigate  other  leakage 
metrics,  settling  on  conditional  mutual  information  as  an  appropriate  metric 
for  quantification  of  flow  in  probabilistic  languages  [23];  they  do  not  consider 
relative  entropy.  Mutual  information  is  always  at  least  0,  so  unlike  relative  en¬ 
tropy  it  cannot  represent  misinformation.  As  noted  in  §2.3.4,  this  uncertainty- 
based  definition  requires  a  strong  admissibility  restriction:  the  attacker's  pre¬ 
belief  must  be  the  same  distribution  from  which  the  system  generates  the  high 
input.  Malacaria  [77]  extends  this  line  of  work  by  classifying  the  rate  of  leak¬ 
age  of  loops.  His  basic  definition  of  amount  of  leakage  is  equivalent  to  [24],  so 
it  is  an  instance  of  our  own  definition,  as  shown  in  §2.3.4.  For  the  same  rea¬ 
son,  Malacaria's  model  is  no  more  precise  than  our  own  model.  Rate  of  leakage 
could  be  defined  in  our  own  model,  like  the  other  statistics  in  §2.3. 

Backes  [10]  quantifies  information  flow  for  reactive  systems,  which  exe¬ 
cute  cryptographic  protocols,  as  the  maximum  distance  between  the  low  user's 
views  of  a  protocol  run  for  any  two  high  behaviors,  where  a  view  is  a  probabil¬ 
ity  distribution  on  the  traces  observed  by  the  user.  The  distance  metric  is  left 
abstract,  hence  not  instantiated  by  any  information-theoretic  definition. 
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Di  Pierro,  Hankin,  and  Wiklicky  [38]  relax  noninterference  to  approximate 
noninterference,  where  "approximate"  denotes  similarity  of  two  processes  in  a 
process  algebra;  similarity  is  quantified  using  the  supremum  norm  of  the  differ¬ 
ence  between  the  probability  distributions  that  the  processes  create  on  memory 
This  quantity  can  be  interpreted  as  a  probability  on  an  attacker's  ability  to  dis¬ 
tinguish  the  two  processes  using  a  finite  number  of  tests.  This  work  also  builds 
an  abstract  interpretation  that  allows  approximation  of  the  confinement  of  a 
process.  Subsequent  work  [39]  generalizes  from  process  algebras  to  probabilis¬ 
tic  transition  systems. 

Lowe  [76]  defines  the  information  flow  quantity  of  a  process  with  two  users  H 
and  L  to  be  the  number  of  behaviors  of  H  that  L  can  distinguish.  When  there 
are  n  such  distinguishable  behaviors,  H  can  use  them  to  transmit  lg  n  bits  to  L. 

Weber  [123]  defines  n-limited  security,  which  allows  declassification  at  a  rate 
that  depends,  in  part,  on  the  size  n  of  a  buffer  shared  by  the  high  and  low  pro¬ 
jections  of  a  state. 

Millen  [88],  using  deterministic  state  machines,  proves  that  a  system  satis¬ 
fies  noninterference  exactly  when  the  mutual  information  between  certain  in¬ 
puts  and  outputs  is  zero.  He  also  proposes  mutual  information  as  a  metric  for 
information  flow,  but  he  does  not  show  how  to  compute  the  amount  of  flow  for 
programs. 

Database  privacy.  Evfimievski,  Gehrke,  and  Srikant  [42]  quantify  privacy 
breaches  in  data  mining.  In  their  framework,  randomized  operators  are  applied 
to  confidential  data  before  the  data  is  released.  A  privacy  breach  occurs  when 
release  of  the  randomized  data  causes  a  large  change  in  an  attacker's  proba¬ 
bility  distribution  on  a  property  of  the  confidential  data.  They  use  Bayesian 
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reasoning,  based  on  observation  of  randomized  data,  to  update  the  attacker's 
distribution.  Their  distributions  are  similar  to  our  beliefs,  but  have  the  same 
strong  admissibility  restriction  as  Clark  et  al.  [24]  (c.f.  §2.3.4).  They  also  show 
that  relative  entropy  can  be  used  to  bound  the  maximum  privacy  breach  for  a 
randomized  operator. 

Anonymity  protocols.  Chatzikokolakis  et  al.  [21]  analyze  the  degree  of 
anonymity  provided  by  anonymity  protocols.  They  model  protocols  as  chan¬ 
nels,  and  they  quantify  the  loss  of  anonymity  introduced  by  a  protocol  as  the 
information-theoretic  capacity  of  the  channel. 

Noninterference.  The  flow  model  (FM)  is  a  security  property  proposed  by 
McLean  [84]  and  later  given  a  quantitative  formalization  by  Gray  [49],  who 
called  it  the  Applied  Flow  Model.  The  FM  stipulates  that  the  probability  of  a 
low  output  may  depend  on  previous  low  outputs,  but  not  on  previous  high 
outputs.  Gray  formalizes  this  in  the  context  of  probabilistic  state  machines,  and 
he  relates  noninterference  to  the  maximum  rate  of  flow  between  high  and  low. 

Browne  [19]  develops  a  novel  application  of  the  Turing  test:  a  system  passes 
Browne's  Turing  test  exactly  when  for  all  finite  lengths  of  time,  the  information 
flow  over  that  time  is  zero. 

Volpano  [118]  gives  a  type  system  that  can  be  used  to  establish  the  security 
of  password  checking  and  one-way  functions  such  as  MD5  and  SFIA1.  Nonin¬ 
terference  does  not  usually  allow  such  functions  to  be  typed,  so  this  type  system 
is  an  improvement  over  previous  type  systems.  However,  the  type  system  does 
not  allow  a  general  analysis  of  quantitative  information  flow. 
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Volpano  and  Smith  [120]  give  a  type  system  that  enforces  relative  secrecy , 
which  enforces  that  well-typed  programs  cannot  leak  confidential  data  in  poly¬ 
nomial  time. 

Nondeterminism.  Wittbold  and  Johnson  [127]  introduce  nondeducibility  on 
strategies,  an  extension  of  Sutherland's  nondeducibility  [113].  Wittbold  and  John¬ 
son  observe  that  if  a  program  is  run  multiple  times  and  feedback  between  runs 
is  allowed,  information  can  be  leaked  by  coding  schemes  across  multiple  runs. 
A  system  that  is  nondeducible  on  strategies  has  no  noiseless  communication 
channels  between  high  input  and  low  output,  even  in  the  presence  of  feedback. 
Our  insider  framework  can  quantify  the  leakage  due  to  strategies  that  are  en- 
codable  as  insider  functions. 

Halpern  and  Tuttle  [53]  introduce  a  framework  for  reasoning  about  knowl¬ 
edge  and  probability  based  on  three  kinds  of  adversaries:  adversaries  who  make 
nondeterministic  choices,  adversaries  who  represent  the  knowledge  of  the  op¬ 
ponent,  and  adversaries  who  control  timing.  Our  insiders  can  be  seen  as  an 
instantiation  of  this  framework.  The  insider  choice  and  insider  function  consti¬ 
tute  an  adversary  who  makes  nondeterministic  choices,  and  each  of  the  models 
of  the  insider's  power  in  §2.5.1  correspond  to  an  adversary  representing  the 
knowledge  of  the  opponent.  Gray  and  Syverson  [50]  apply  the  Halpern-Tuttle 
framework  to  reason  about  qualitative  security  of  probabilistic  systems.  They 
relate  their  security  condition  to  probabilistic  noninterference  [49]  and  informa¬ 
tion  theory.  Halpern  and  O'Neill  [51]  construct  a  framework  for  reasoning  about 
secrecy  that  generalizes  many  previous  results  on  qualitative  and  probabilistic, 
but  not  quantitative,  security.  Their  framework,  like  ours,  uses  subjective  prob¬ 
ability  distributions. 
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Mclver  and  Morgan  [82]  calculate  the  channel  capacity  of  a  program  using 
conditional  entropy.  They  add  demonic  nondeterminism  as  well  as  probabilis¬ 
tic  choice  to  the  language  of  while-programs,  and  they  show  that  whether  a 
program  is  perfectly  secure  (i.e.,  leaks  0  bits)  is  determined  by  the  behavior  of 
its  deterministic  refinements.  They  also  consider  restricting  the  observational 
power  of  the  demon  making  the  nondeterministic  choices. 

2.7  Summary 

This  chapter  presents  a  model  for  incorporating  attacker  beliefs  into  analysis  of 
quantitative  information  flow.  Our  theory  reveals  that  uncertainty,  the  tradi¬ 
tional  metric  for  information  flow,  is  inadequate.  Information  flows  when  an 
attacker's  belief  becomes  more  accurate,  but  an  uncertainty  metric  can  mistak¬ 
enly  report  a  flow  of  zero  or  less.  Inversely,  misinformation  flows  when  an  at¬ 
tacker's  belief  becomes  less  accurate,  but  an  uncertainty  metric  can  mistakenly 
report  a  positive  information  flow.  Hence,  in  the  presence  of  beliefs,  accuracy  is 
the  correct  metric  for  information  flow. 

We  have  shown  how  to  use  an  accuracy  metric  to  calculate  exact,  expected, 
and  maximum  information  flow;  other  statistics  of  information  flow,  such  as 
variance,  median,  and  rate,  could  be  defined  in  the  same  way.  We  have  demon¬ 
strated  that  our  metric  generalizes  uncertainty  metrics.  Our  formal  model  of 
experiments  enables  precise,  compositional  reasoning  about  attackers'  actions 
and  beliefs.  We  have  instantiated  this  model  with  a  probabilistic  semantics  and 
have  shown  that  probabilistic  choice  is  essential  to  producing  misinformation. 
We  have  also  extended  the  model  to  enable  analysis  of  information  flow  caused 
by  insiders  who  collude  with  attackers. 
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2.A  Appendix:  Proofs 


Theorem  2.1.  B(S,o)(aH)  =  BI(S,o). 


Proof. 


BI(S,o) 


(  Definition  of  BI  ) 

MgjjO  •  ([g](*L  ®  °h)  tjO(o) 

(E  °'h  ■  M°/f )  •  (lS](crL  ®v'h)  r^)(o)) 

(  Definition  of  5  \  L,  apply  distribution  to  o  ) 

bH(crH)  ■  ((E  cr  :  a\L  =  o  :  (Jg](aL  ®Qif)(g-))) 

(E  aH  '■  MO'((E  cr  :  a\L  =  o  :  ([S](<xL  ®  u^)(u)))) 


(  Lemma  2.1  (below) ) 

bH(crH)  •  ((E  cr  :  a\L  =  o  :  ([g](o-L  <8>  &h)(<t))) 

(E  cr'  :  v' \L  =  o  :  [S’]((TZ/  (8)  bH) (a')) 

(  Distributivity,  one-point  rule  ) 

(E  cr  ■■  a\L  =  o  A  u  \  H  =  cr  H  :  (E  °'h  '■  Mgjd  1  ®  Off)^))) 

[-S'Ko’l  ®  bH)i<T')) 

(  Lemma  2.1  (below) ) 

(Eo~:o-r£  =  c>  A  o-  =  o-H  :  [gKjj,  ®  M(g)) 

(E  a'  :  i L  =  0  '  [*S] (ctl  ®  bH) (cr')) 


(  Distributivity  ) 


(Jf  a  :  a\L 


o  A  a\ H  —  an  '■ 


_ [gKgL  ®  bH)((r) _ 

(E  cr'  :  a'  \L  =  o  :  [S’] (aL  <®  bH)(cr')) 


(  Definition  of  5  \  L  ) 


{Jf  a  :  a  \ H  =  aH  :  (([5](<tl  ®  bH))\o)(a)) 


(  Definition  of  S\H,  applying  distribution  to  aH  ) 

((([S](<E®  bH))\o) \H){<th) 
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(  Definition  of  B(S,  o)  ) 


B(S,  o)(aH) 

□ 

Lemma  2.1.  Let  a\  L  —  o.  Then: 

®  bH)(a)  =  (E  cth  ■  M°h)  ■  [S\(aL  <g>  crH){a)). 

Proof. 

[5]((TL  ®  bH)  (cr) 

=  (  Definition  of  [S'] <5 ) 

(E  ^  :  (^®MK)-([5]^)W) 

=  (  Definition  of  point  mass  ) 

(E  g7  :  d \L  =  :  bH(a' \H)  •  ([S]a')(u)) 

=  (  Let  cr  =  cr l  U  (Tf/,  nesting,  one-point  rule  ) 

(E  aH  ■  bH(aH)  ■  {S\(aL  <g>  aH)(a )) 

□ 


Theorem  2.2.  Q(£,  b'H)  =  Z5a(o)  -  Zd-s(o). 

Proof. 

Q(£,b'H) 

=  (  Definition  of  Q  ) 

D{bH  ->  &h)  ~  D(b'H  -t>  cr h) 

=  (  Definitions  of  D  and  point  mass  ) 
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-lg  bH(aH)  +  lg  b'H(aH) 

(  Lemma  2.2  (below),  properties  of  lg  ) 

-lgPr,5A(o)  +  lgPr5s(o) 

(  Definition  of  X  ) 

I5a(o)  - ISs(o ) 


Lemma  2.2.  b'H(aH)  =  bH{crH )  ■ 


Proof 

=  (  Definition  of  b'H  in  experiment  protocol ) 

(([S](<7l®  bH)\o) \H){<th) 


(  Definition  of  <5  ( H  ) 

a  :  a  \ H  =  aH  :  ([S](aL  <8  bH)\o)(a)) 

(  Definition  of  5\o  ) 


a  :  a  \  H  =  aH  A  a  \  L  =  o  : 

(  One-point  rule:  a  =  oU  aH  ) 

lS](<jL®bH)(oUcrH ) 


[SIQll  ®  M(cr)  ■ 

®  bH)  \  L){o) 1 


([S](<7l®M  \L)(o) 

(  Definition  of  5a  ) 

sX(o)  '  lsl(*L  ®bH)(oUaH) 

(  Definition  of  [S']  5  ) 


□ 
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6^)  ■  (E  o'  ■  Ol  ®  MO')  ■  ([<%')(°  u  aH)) 

(  Definition  of  <g),  point  mass  ) 

5^)  •  (E  :  ^ \L  =  °l  ■  bH(a'  \ H)  •  ([5](ctl  <g>  (&' \H)))(oUaH)) 

(  High  input  is  immutable  ) 

sJ(oj  •  (E  a'  ■  o'  \ L  =  °L  a  a'\H  =  aH  \  bH(a'  \ H ) 

•  ([5](o-l  ®  {a'  \H)))(oUaH)) 


{  One-point  rule:  a'  =  a L  U  aH  ) 

5^ )  '  MM  •  ([Sj(crL  <g>  <r^))(o  U  crH) 

(  High  input  is  immutable.  Definition  of  5  f  L  ) 

'M°M  ■  (([5](o-l®  a'H))  \L)(o) 


(  Definition  of  6s  ) 


MM 


Mg) 

5a  (o) 


□ 


Theorem  2.3.  Let  £  =  ( S ,  bn,  cth,  o~l)-  Then: 

Q(£,b'H)  =  k  =  b'H{aH )  =  2k  ■  bH(crH). 

Proof. 

Q{S,VH)  =  k 

=  (  Definition  of  Q  ) 

D(bH  -»  b h )  -  D{b'H  -»  a#)  =  k 
=  (  Definition  of  I)  ) 
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—  (Ig bh(<rH)  -  lg  b'H{crH))  =  k 

=  (  Arithmetic,  properties  of  log  ) 

1)'h((7h)  =  2^  ■  bH(<jH) 

□ 

Theorem  2.4.  S'  e  Det  =>•  VS,  b'H  e  B(£)  .  Q(£,  b'H)  >  0. 

Proof.  Assume  S'  e  Det  and  let  £,  l/n  be  arbitrary. 

Q(£,  b'H)  >  0 

=  (  Definition  of  Q,  arithmetic  ) 

D{bn  — 1 >  cth)  >  bJ(l/H  — >  b n ) 

=  (  Definition  of  D,  arithmetic  ) 

lg 

=  (  Lemma  2.3  (below),  lg  is  monotonic  on  (0, 1],  admissibility  of  6  ) 

true 

□ 

Lemma  2.3.  Assume  S'  e  Det  and  Zef  £,  6#  Z?c  arbitrary.  Then: 

b(crH)  <  b'(crH). 

Proof. 

b'{aH ) 

=  (  Definition  of  b'  ) 

([S']((jL®  bH)\o\H)(aH) 
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=  (  Definition  of  \  H,  application  to  a H,  one-point  rule  ) 

(E  °'l  ■  ([5](o-l  ®  bH)\o)(a'L  U  <th)) 

=  (  Definition  of  \,  one-point  rule  ) 

IgKjj,  ®  bH)(o  U  cth ) 

([s](o-l®m  r^)(o) 

=  (  High  input  is  immutable  ) 

ftQg)  •  1^1  ®  bH)(p  U  grH) 

=  (  Output  of  S'  is  a  point  mass  (see  below),  let  x  be  the  denominator  ) 

bjoji)  ■  1 
x 

>  (  Admissibility  of  b  implies  x  G  (0, 1],  arithmetic  ) 

b(aH) 

To  see  that  the  output  of  S  is  a  point  mass,  let  o  be  the  observation  producing 
b'.  It  is  straightforward  to  check  that  if  S  G  Det,  then  [S]cr  is  the  point  mass  at  a', 
where  a1  is  the  state  produced  by  the  standard  denotational  semantics  of  while 
programs,  such  as  WinskeTs  [125].  So  the  output  of  [[.S']  (aL  U  07/)  is  the  point 
mass  at  o  U  aH.  □ 

Theorem  2.5.  Q(S,  b ')  =  £{Hin,  Lin,  Lout). 

Proof. 

Q(£,b’) 

=  (  Definition  of  Q  ) 

D(b  — >  &h)  —  D(b'  — >  cth) 
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(  Definition  of  D  ) 

H{b  \(LUL°U  H0))  -  H(b  \  (L  U  L0)) 

-  {H(V  \(LUL°U  H °))  -  H{b'  \  (L  U  L °))) 

(  Definition  of  domain  of  b  ) 

H{b  \  (L°  U  H °))  -  H(b \L°)  -  (H(b'  \(LUL°U  H °))  -  H{b'  \  (L  U  L °))) 

(  Definitions  of  Hin,  Lm/  Lout;  b'  is  an  output  distribution  ) 

H{Hin,L 

in  )  -  H(Lin)  Lin,  Lout)  Lout)^ 


(  Definition  of  conditional  entropy  ) 

H(Hin\Lm)-H(Hm\L  ini  L out) 

(  Definition  of  C  ) 

■d  {H in ,  Lim  L  out ) 


□ 


Theorem  2.6  Let: 

£  =  ( S,bH,o-H,o-L), 

5'  =  [£](<Ti<g><T/r), 

eH  =  ((m°L®bHM6'\L))\H. 

Then: 

Qe(£)  <  Q(£,eH). 


Proof 

Qe(£) 

=  (  Definition  of  QE  ) 
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E  oeS%[Q(S,B(S,o))] 


=  (  Definition  of  Q,  let  b'H  =  B(£,o)  ) 

E 0£5'\L[D(bH  >  ct h)  D(b'H  — >  c-#)] 

=  (  Linearity  of  E  ) 

D{bn  — 1 >  —  Eoe<5/|i[D(6/i/  — 1>  g#)] 

<  (  Jensen's  inequality  and  convexity  of  D  [32]  ) 

D[bH  — 1>  &h)  —  D(Eoes^L[b'H\  — >  &h) 

=  (  Lemma  2.4  ) 

D(6h  -t>  &H)  ~  D(eH  ->  cth) 

=  (  Definition  of  Q  ) 

Q{S,eH) 

□ 

Lemma  2.4.  Let  £,  S',  en  be  defined  as  in  theorem  2.6.  Let  b’H  =  B{£ ,  o),  where 
o  <G  S’  \  L.  Then: 

^■oes%\b'H\  =  eH. 

Proof,  (by  extensionality) 

E  oeS'\L[b'H\(cr  h) 

=  (  Definitions  of  E,  b'H  ) 

(Eo:(^L)(o).6(f,o)))N 
=  (  Definition  of  B(£,o)  ) 

((£  °  =  (*'  r L)(o)  •  ((([S](<xL  ®  bH))\o)  \ H))){oh) 
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(  Definition  of  5\H,  applying  distribution  to  oH  ) 


(E  o  :  {S' \L)(o)  •  ((E  a'  :  o' \H  =  oh  :  (([5](ul  ®  M)l°)0')))) 

(  Definition  of  8\o,  applying  distribution  to  o'  ) 

(E  o  :  ((S'rLXoME  a>  :  °'\H  =  oh  a  o'\L  =  o  : 

(  One-point  rule  ) 

fy  iA'  r  Mb)  ( 1*^1  aH )  \ 

(E  o  .  o  1  L){o)  aoTI^L06if)rL)(o)  ) 


(  Definition  of  S  f  L,  applied  to  o  ) 


(ST'  n  .  (Xi  tT\(n\  Kl"u{VL®bH))(oUOH)  \ 

{^o'  :  o' \L  =  o:  lSj(<xL  ®  bH){o’)) > 

(  Let  o  =  o  U  oh,  change  of  dummy:  o  :=  o ,  definition  of  =l  ) 

(y'  a  ■  o  \  ff  —  Ozj  •  (S'  into)  ■  _ (HERE  ®  fog))(°~) _ 

{^o  .o\H-oh  ■  [d  \  L){o)  .  (7,=La  .  lS](aL®bH)(o'))' 

(  Definition  of  S\SL,  applied  to  o  ) 

(E  o  :  o\H  =  oH  :  ([S](ol  ®  bH)\{S' \L)){o)) 

(  Definition  of  5\  H,  applied  to  oH  ) 

(  Definition  of  eH  ) 

eff(cTff) 


□ 


Theorem  2.7.  Let  £  =  ( S ,  bn ,  Sh,  &l),  where  bH  =  5h-  Then: 

Qe{£)  —  Z(Hin,  Lout\Lin). 

Proof.  Consider  the  amount  of  flow  resulting  from  a  given  high  input  oh,  obser¬ 
vation  o,  and  postbelief  b'H.  We  calculate: 
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Q((S,  bH,aH,aL),b'H ) 

(  Definition  of  Q  ) 

D{bH  — 1>  &H)  —  D(b'H  — 1>  &h) 
(  Definition  of  D  ) 

-  lg(&H  (°ff))  +  lg(6,H(o’/r)) 


lg 


lg 


lg 


(  Log  identity  ) 

Vh^h) 


b  Hip’ll) 

(  Definition  of  b'H  and  5a  ) 

(MWN 

(  Lemma  2.5  ) 

(IWLgfed 


It  is  now  convenient  to  introduce  notation  for  probability.  Let  Pr s{E)  denote 
the  probability  of  event  E  according  to  distribution  5,  and  let  Pr()(L|  F)  denote 
Pr5|F(F).  Let  h  denote  the  event  that  the  high  input  sampled  from  Hin  is  ah/  let  l 
denote  the  event  that  the  low  input  sampled  from  Lin  (which  is  actually  a  point 
mass)  is  aL,  and  let  o  denote  the  event  that  the  observation  sampled  from  Lout 
is  o.  Then  ((5a\o)  \ can  be  rewritten  as  PisA(h\o)‘,  and  5a(<Jh),  as  Pr sA(h). 
We  continue  calculating: 


lg 


lg 


(MlglN 

5a{&h) 

(  Rewriting  using  probability  notation  ) 

PllOIq) 

Pr5A(/z) 
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(  5a  =  SA\aL  ) 


lg 


p*sA\i(h) 


(  Rewriting  using  conditional  probability  notation  ) 


lg 


Pr faWjU) 

PitAW) 


(  Definition  of  conditional  probability  ) 

,  Pr sJJho\!) 

S  Pr,5A(/r|/)  •  Pr<sA(o|Z) 


Now  take  the  expectation  of  the  amount  of  flow  with  respect  to  observation  o, 
which  is  distributed  according  to  5'  =  [S']  (at  <E>  aH). 


Eo[lg 


PisA(h,o\l) 


Pr5Am -PrsMiy 

(  Definition  of  E  ) 


E0pD'(o)  ■  lg 


PrSA(h,o\l) 


PrdA(h\l)  -PrSA(o\l) 

(  5'  =  SA\h,  /;  conditional  probability  notation  ) 

V  Prx  (o\h  l)  ■  k _ PvsA(h,  o\l) _ 

2^o^6a{° \n,l)  1gpr^(/1|/)  .pr<5A(o|/) 


Again  take  the  expectation,  now  with  respect  to  high  input  h,  whose  distribution 
is  6h: 


E4E0Pr<5A(plM)  -  lg 


Pr  sA(h,o\l) 
PrsA(h\l)  ■  Pr«5A(o|/) 


(  Definition  of  E  ) 

E„  PT..W  ■  0  ■  te 


( SH  =  bH ) 
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EhPrto(>0  ■  Eo  PvsA(°\h,  l)  ■  lg  P  r<5A(4f)(.  Pr[A(o|/) 

(  Lemma  2.5  ) 

■  E„Pr*>IM  ■ 

(  5 a\1  =  5a',  conditional  probability  notation  ) 

.  E„  PrsA(o|fc,()  ■  lg  prJIf^(0|;) 

(  Distributivity  ) 

E.E.Pr«,(fcU)  •  PrsMKD  ■  lg 

(  Definition  of  conditional  probability  ) 

EtE,P^(ft.O|0-igPlJ^(ftff(o|;) 

(  Definition  of  conditional  probability  ) 

v  v  Pr«5,  Q,  o,  l)  ,  Pr<54  (h,  o\l) 

P VSa(i)  'lgPr5A(/r|/)  -PtSa(o\1) 

(  5a  is  a  point  mass  at  /,  twice  ) 

Ez  Eh  Eo  PDa (^>  o,  l )  •  lg  Pr^^if^.  pr^(o|/) 

(  Definition  of  mutual  information  ) 

-Potizl-Pjn) 


□ 


72 


Lemma  2.5.  bH  =  6 A\  H. 


Proof.  Let  an  be  arbitrary,  and  let  b  =  <jL  ®  bH  be  the  attacker's  belief  about  the 
entire  (low  and  high)  state.  We  calculate: 

(6a\H){(Th) 

=  (  Definition  of  6A  ) 

([Sl{&L®bH)\H){aH) 

=  (  Definition  of  b  ) 

(([s]6)rf0(<7«) 

=  (  Definition  of  \  H  ) 

(E  <T  :  <?\H  =  <rH  ■■  ([5]6)(o-)) 

=  (  Definition  of  [S'] <5 ) 

(£a  :  a\H  =  aH  :  (E  ^  :  b(a’)  •  ([S]u')(u))) 

=  (  High  input  is  immutable  ) 

(E  :  <t\H  =  <rH  ■  (E  o'  :  a>  \ H  =  aH  '■  b(a')  ■  (I^l(j/)((T))) 

=  (  Commutativity,  distributivity  ) 

(E  <y'  :  <j'\H  =  aH  :  b(cr')  ■  (Jf  a  :  a  \ H  =  aH  :  ([«S']cr/) (cr))) 

=  (  High  input  is  immutable  ) 

(E  a>  :a>\H  =  aH:b(a>)-(^a:  (I^K)^))) 

=  (  S  always  terminates  ) 

(E  a'  :  a'  \  H  =  aH  :  b(a')  •  1) 
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=  (  Definition  of  \  H  ) 

C b\H)(aH ) 

=  (  Definition  of  b  ) 

) 

Therefore  5#  =  5a  T  H  by  extensionality. 

□ 

Theorem  2.9.  S  £  ObsDet  =  V£,  b'H  £  £1(£)  .  Q(£,  6'^)  =  0. 

Proof.  By  mutual  implication. 

(=>)  Assume  S  £  ObsDet.  Let  £  =  (S,aL,crH,bH,  I)  and  b'H  £  £>(£)  be  arbi¬ 
trary. 

Q{Wh)  =  o 

=  (  Definition  of  Q,  arithmetic  ) 

D{bn  — 1 >  0h)  —  D{b'H  — >  crH) 

=  (  Definition  of  D,  arithmetic  ) 

bh i(&h)  =  b'H(aH) 

=  (  Lemma  2.7  ) 

true 

This  concludes  the  forward  direction  (=>)  of  the  proof. 
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(<^=)  By  contrapositive.  Assume  S  ObsDet.  We  need  to  show: 

3£  =  (5,  UL,  <th,  bH,  I),  b'H  e  B{S) .  Q(£,  b'H  +  0 

We  calculate: 

S  ObsDet 

=  (  Definition  of  ObsDet ) 

_|V/,  crL35iyaH  •  [5']/(Al  o'h)  \ L  —  8l 

=  (  Predicate  calculus,  change  of  dummy  ) 

31,  aLW8L3aH  .  [3jj(aL  ®  crH)  \L  ±  SL  (*) 

Make  the  following  definitions: 

1  =  1 

(?L  =  crL 

a'H  =  arbitrary 

S'  =  [*S]/(<7L  ®  < t'h ) 

6'l  =  8'\  L 

an  =  the  an  guaranteed  by  formula  (*)  above  when  8L  =  8'L 

5  =  [SU*l  ®  cth) 

8l  =  8\L 

And  let  bH  be  the  belief  mapping  aH  to  1/2  and  o'H  to  1/2. 

We  have  now  defined  all  the  variables  in  experiment  S,  but  we  need  to  define 
b'H  e  B(£).  To  that  end,  we  calculate  attacker  prediction  8a- 
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5a 

=  (  Definition  of  prediction  ) 

[5]/(nL  <8>  bH ) 

=  (  Definition  of  [S'] 5  ) 

1/2  •  [S](<xL  <g>  bH)  +  1/2  •  [S](aL  <g>  <t^) 

=  (  Definition  of  5,5'  ) 

1/2  •  (5  +  5') 

To  define  b'H,  we  also  need  an  observation  o.  Note  that,  by  formula  (2.6), 
5l  7^  5'l,  so  there  is  some  low  state  a'L  such  that  5l(ct'l)  ^  5'L (a/).  Assume, 
without  loss  of  generality,  that  5lK)  >  5^K).  Let  o  be  a'L.  But  in  order  for 
o  to  be  an  observation,  it  must  be  that  o  G  [.S']  (aL  ®  aH),  which  implies  that 
[S]((7l  ®  b h){o)  >  0.  This  is  guaranteed  by  the  fact  that  5l(o)  >  5'L(o),  and  that 

W  >  o. 

We  can  now  calculate  b'H: 

VH 

=  (  Definition  of  b'H  experiment  protocol ) 

$a\o\H 

=  (  Definition  of  5a  ) 

1/2-  (5  +  5')\o\H 
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With  all  these  definitions,  we  can  prove  the  desired  result: 


Q{£,vh)±  0 

=  (  Definition  of  Q,  arithmetic  ) 

D(bH  -»  crH)  f  D(b'H  -»  aH) 

=  (  Definition  of  D,  arithmetic  ) 

=  (  Lemma  2.9  ) 

true 

Lemma  2.6. 


□ 


S' G  ObsDet  =7  V/.V<tl.3<5l.V<5jj .  \\SH\\  —  1  =7  [S']/((JL®(ji?)  \L  —  SL. 

Proof.  Assume  S  G  ObsDet.  Let  I,crL  be  arbitrary.  Let  8l  be  the  distribution 
guaranteed  to  exist  by  the  definition  of  ObsDet.  Let  5h  be  arbitrary  such  that 

II M  =  i. 

[S]/(o-L  ®  aH)  \L 
=  (  Definition  of  [S'] <5 ) 

(E  aH  ■  8h(&h)  ■  [S']/ (<7/,  u  <Jh)))  \L 
=  (\  L  distributes  over  +,  •  ) 

(XI  aH  ■  5//(u//)  •  [S]/(uL  U  [L) 

=  (  S  G  ObsDet,  definition  of  5L  ) 
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(E  aH  ■  SH(aH)  •  SL) 

=  (  Distributivity,  definition  of  ||<5||  ) 

=  (  Assumed  \\Sh\\  —  1  ) 

Sl 

□ 

Lemma  2.7.  Assume  S  e  ObsDet.  Let  £  =  (S,(TLlaHlbH,d)  and  b'H  e  £>(£)  be 
arbitrary.  Then: 

bH  =  b'H. 

Proof.  Let  SA  =  0  bH).  Let  o  e  [-S']j(<7i  0  crH)  \ L. 

b'H 

=  (  Definition  of  b'H  in  experiment  protocol ) 

{Sa\o)  \ H 

=  (  Definition  of  \  H  ) 

\aH  .(£</:  a'\H  =  aH  :  (5>i|o) (cr7)) 

=  (  Definition  of  S  \  o  ) 

A  on  .(Eff'  :  \ H  =  °H  '■  if  (cr'\L)  =o  then  (~^(o)  else  0) 

=  (  Lemma  2.6  ) 

A aH  .  (E  cr'  ■  cr'  \  H  =  <jh  ’■  if  (a1  \  L)  —  o  then  else  0) 

=  (  One-point  rule  ) 
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\  Lt(o UtTfj) 

X(JH ■  <5  l{o) 

(  Lemma  2.8  ) 

\  t  bH(aH)-SL(o) 

XaH- - M°) - 

(  Arithmetic,  77-reduction  } 


Lemma  2.8.  Assume  the  definitions  in  lemma  2.7  and  its  proof.  Then: 

8a(o  U  oh)  =  bH(crH)  ■  SL(o). 

Proof. 

5a{o  U  oh) 

=  (  Definition  of  8  a  ) 

<8>  bn)(o  U  oh) 

=  (  Definition  of  {S}8  ) 

(E  o'  :  (oL  ®  bH){cr')  •  ([S']/cr/)(o  U  oh )) 

=  (  Definition  of  (8),  one-point  rule  ) 

(E  o'h  ■  •  ([S]/(o-L  u  o'H))(o  U  oH)) 

=  ( Immutable  high  input,  one-point  rule  ) 

bn  ip  h)  ■  ([5l/(aL  U  oH))(o  U  oH) 

=  ( Immutable  high  input,  definition  of  \  L  ) 

bn {p h)  ■  (([>S]/(crL  u  oH))  \L)(o) 


□ 
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=  (  S  G  ObsDet,  definition  of  5L  ) 

bH<yO-H )  •  SL(o) 

□ 

Lemma  2.9.  Assume  the  definitions  in  the  contrapositive  proof  of  theorem  2.9.  Then: 

frff/Cf/)  f  b'H((7H). 

Proof.  First  we  calculate  b'H{(TH)' 
b'H  ip  h  ) 

=  (  Definition  of  h'H  ) 

(< 8a\o\H){(th ) 

=  (  Calculation  of  £4  in  theorem  2.9  ) 

(1/2  •  (<5  +  <5')|o  r H){(7h) 

=  (  Definition  of  5\H,  one-point  rule,  D  defined  below  ) 

(Jf  &l  ■  (1/2  •  (5  +  5')\o)(aL  U  <jh)/D ) 

=  (  Definition  of  5\o,  one-point  rule  ) 

1/2-  {5  +  8')(oUaH)/D 
=  (  Definition  of  +  for  distributions  } 

1/2  •  (6(0  U  crH)  +  8\o  U  a’H))/D 
=  (  Definition  of  5',  immutability  of  H  input ) 

1/2  •  <5(o  U  cjh)/D 
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Quantity  D  is  defined  to  be  1/2  ■  (<5(oU  aH)  +  S'(o  U  a'H)).  Similarly,  we  can 
calculate  b'H(cr'H )  =  1/2  •  <5(oU  <j'h)/D. 

We  next  calculate  5l{o ): 

h(o) 

=  (  Definition  of  SL  and  projection  } 

(X  aH  ■  S(o  U  aH )) 

=  (  Definition  of  5,  immutability  of  high  input,  one-point  rule  ) 

5(oUaH) 

Similarly,  S'L(o)  =  d'(oU  a'H ).  By  the  definition  of  o  we  have  5L(o )  7^  <5/(o),  so 
5(oU  a  1 j)  7^  5'(oU  a'H ).  Thus: 

=  (  Calculated  value  of  b’H{(r'H)  ) 

1/2  •  S(o  U  <j'h)/D 
7^  (  Above  inequality  ) 

1/2  •  5(o  U  cth)/D 

=  (  Calculated  value  of  Vh^h)  ) 
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Finally,  note  that  by  the  immutability  of  high  input,  the  only  high  states  with 
non-zero  mass  in  b'H  are  aH  and  o'H.  If  b'H(<TH )  =  1/2,  we  would  be  forced  to  con¬ 
clude  b'H(a'H )  =  1/2  because  the  mass  in  a  belief  must  sum  to  1.  But  this  would 
contradict  the  previous  calculation.  So  b'H(<jH)  f  1/2.  Thus,  since  bH(aH )  =  1/2, 
we  conclude  bH(<jH)  f  b'H(aH)- 

□ 


Theorem  3.2  D{b  ->  D)  —  Q((A,  6,  H,  g),  6')  +  SP((A,  b,  D,  q ),  6') 

Proof.  The  quantity  of  leakage  is 

Q((A,  b,  D,  q),  b')  =  D(b  -h>  D)  -  D(b'  -h>  D). 

And  the  amount  of  program  suppression  is 

Sp({A,  b,  D,  q),  b')  =  D((b'\q)  \  TI  ->  D) 

=  D(b'  — >  D). 

The  equality  follows  because  q  is  already  contained  in  the  attacker's  observa¬ 
tion,  so  b'  has  already  been  conditioned  on  q)  and  because  restricting  //  to  trusted 
inputs  is  here  equivalent  to  restricting  to  secret  inputs  (i.e.,  the  actual  database 
contents),  and  this  has  already  been  done  by  the  experiment  protocol  that  pro¬ 
duced  b' . 

Substituting  and  rewriting,  we  have 

D{b  -»D)  =  Q{(A,  b,  D,  q) ,  b')  +  SP((A,  b,  D,  q),  V).  □ 
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CHAPTER  3 


QUANTIFICATION  OF  INTEGRITY 

Computer  security  policies  often  involve  integrity  requirements  for  infor¬ 
mation  and  other  system  resources — for  example,  that  electronic  data  must  cor¬ 
rectly  represent  what  appears  in  paper  sources  [37,  glossary  entry  "data  in¬ 
tegrity"],  that  information  may  be  modified  only  by  authorized  programs  and 
authorized  users  [26],  or  that  inputs  to  a  program  must  be  validated  before  be¬ 
ing  used  to  change  system  state  external  to  the  program,  such  as  the  filesys¬ 
tem  [122,  p.  356].  This  last  example  can  be  interpreted  as  an  information-flow 
security  policy  in  which  information  from  (attacker-controlled)  inputs  can  be 
considered  untrusted,  whereas  the  system  state  should  contain  only  trusted  in¬ 
formation:  information  flow  from  untrusted  to  trusted  is  prohibited  unless  it 
passes  a  validation  procedure.  Taint  analysis  [75,93,112,122,128]  enforces  a  sim¬ 
ilar  information-flow  policy.  Untrusted  information  is  considered  to  be  tainted; 
and  trusted  information,  untainted.  If  information  flows  from  tainted  sources  to 
a  sink  that  is  supposed  to  be  untainted,  contamination  of  the  sink  has  occurred. 

In  some  scenarios,  a  qualitative  integrity  policy  might  be  overly  restrictive. 
If  the  attacker  can  cause  only  a  little  contamination,  a  flow  from  tainted  to  un¬ 
tainted  (i.e.,  untrusted  to  trusted)  might  be  acceptable.  Thus,  quantitative  in¬ 
tegrity  policies  would  be  useful  in  characterizing  security. 

Since  confidentiality  and  integrity  are  information-flow  duals  [15],  previous 
models  for  quantification  of  information-flow  confidentiality  [11,24,35,49,76, 
80, 88]  seem  likely  to  apply  to  quantification  of  integrity.  In  particular,  the  in¬ 
tegrity  policy  "information  is  prohibited  to  flow  from  untrusted  to  trusted"  is 
the  dual  of  the  confidentiality  policy  "information  is  prohibited  to  flow  from  se¬ 
cret  to  public,"  which  is  the  kind  of  qualitative  policy  that  previous  work — and 
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chapter  2 — has  made  quantitative.  Here,  we  adapt  the  results  of  chapter  2  to 
quantify  contamination  with  accuracy  of  belief. 

Besides  contamination,  there  is  another,  distinct  aspect  of  quantitative  in¬ 
tegrity.  In  the  information-theoretic  model  of  communication  channels  [32], 
a  sender  sends  messages  through  a  noisy  channel  to  a  receiver.  The  receiver 
cannot  observe  the  sender's  inputs  or  the  noise  but  must  attempt  to  determine 
what  message  was  sent.  A  standard  question  to  ask  is:  "how  much  information 
is  transmitted  over  the  channel?"  When  information  is  lost  because  of  noise, 
information  has  been  suppressed;  noise  thus  damages  the  integrity  of  the  infor¬ 
mation.  Here,  we  show  that  suppression  and  transmission  can  be  quantified  by 
using  accuracy  of  beliefs.  We  also  examine  error-correcting  codes  and  show  that, 
as  we  would  expect,  they  reduce  suppression  of  information.  Moreover,  anal¬ 
ysis  of  suppression  is  applicable  with  programs  in  general,  not  just  programs 
that  model  communication  channels. 

Contamination  and  suppression  are  not  necessarily  disjoint:  A  program  that 
takes  t  as  trusted  input  and  u  as  untrusted  input,  then  outputs  pair  (t ,  u )  as 
trusted  output,  exhibits  contamination — because  output  (/.,  u)  is  obviously  af¬ 
fected  by  an  untrusted  input  u — but  does  not  exhibit  suppression.  A  program 
that  instead  outputs  t  ©  n,  where  ©  is  exclusive-or  and  n  is  randomly  generated 
noise,  exhibits  suppression  but  not  contamination.  And  a  program  that  outputs 
t®u  exhibits  both. 

Quantifying  confidentiality  and  integrity  simultaneously  is  useful  for  under¬ 
standing  the  security  of  a  statistical  database,  which  contains  information  about 
individuals  and  should  respond  to  queries  in  a  way  that  protects  the  privacy 
of  those  individuals.  The  queries  and  responses  might  involve  statistics  (e.g., 
sums  or  averages)  computed  from  individuals'  information.  One  mechanism 
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that  enforces  this  privacy  policy  is  the  addition  of  randomly  generated  noise  to 
the  underlying  data  or  to  the  response  [35];  the  database  is  responding  with  in¬ 
formation  that  has  been  deliberately  suppressed  to  improve  confidentiality  The 
quantitative  frameworks  we  have  developed  for  confidentiality  and  integrity 
can  be  used  to  analyze  this  enforcement  mechanism. 

This  chapter  proceeds  as  follows.  Models  and  metrics  for  quantification  of 
contamination  and  suppression  are  given  in  §3.1  and  §3.2.  These  metrics  are 
applied  in  §3.3  and  §3.4  to  error-correcting  codes  and  statistical  databases.  The 
duality  between  confidentiality  and  integrity  is  explored  in  §3.5.  Related  work 
is  discussed  in  §3.6,  and  §3.7  concludes.  Most  proofs  are  delayed  from  the  main 
body  to  appendix  3.  A. 

3.1  Quantification  of  Contamination 

Three  agents  are  involved  in  execution  of  a  program:  a  system,  a  user,  and  an 
attacker.1  The  system  executes  a  program,  whose  variables  are  categorized  as 
input,  output,  or  internal.  Input  variables  may  only  be  read  by  the  program, 
output  variables  may  only  be  written  by  the  program,  and  internal  variables 
may  be  read  and  written  but  are  never  be  observed  by  any  agent  except  the 
system  itself.  The  user  and  the  attacker  supply  inputs  by  writing  the  initial  values 
of  input  variables.  These  agents  receive  outputs  by  reading  the  final  values  of 
output  variables.  Since  the  attacker  is  untrusted,  or  low  integrity,  variables  read 
and  written  by  the  attacker  are  labeled  U.  Likewise,  the  user  is  trusted  and  the 
user's  variables  are  labeled  T.  The  channels  between  agents  and  the  program 
are  depicted  in  figure  3.1. 

1In  chapter  2,  we  modeled  only  the  system  and  the  attacker.  We  further  discuss  the  addition 
of  the  user  in  §3.1.1  and  §3.5. 
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Figure  3.1:  Channels  in  contamination  experiment 

Our  goal  is  to  quantify  the  amount  of  information  about  untrusted  inputs 
that  the  user  learns  by  observing  trusted  outputs.  This  goal  entails  two  restric¬ 
tions  on  the  user's  access  to  variables.  First,  the  user  should  not  be  allowed  to 
read  untrusted  inputs — otherwise,  the  user  could  learn  all  the  untrusted  infor¬ 
mation  without  observing  any  outputs.  Second,  the  user  should  not  be  allowed 
to  read  untrusted  outputs,  because  we  are  interested  only  in  the  information  the 
user  learns  from  trusted  outputs.  In  addition  to  these  restrictions,  for  simplic¬ 
ity,  we  do  not  allow  the  user  to  write  untrusted  inputs  (although  this  would  be 
possible  to  model).  So  the  user  may  access  only  the  trusted  variables. 

Similarly,  the  attacker  may  access  only  the  untrusted  variables.  The  attacker 
may  not  write  trusted  inputs  because  he  is  untrusted.  And  for  simplicity,  we  do 
not  allow  the  attacker  to  read  trusted  inputs  or  outputs.  However,  since  flow 
from  trusted  to  untrusted  need  not  be  prohibited,  it  would  be  possible  to  allow 
and  to  model  such  reads. 

Note  that  these  access  rules  agree  with  the  Biba  integrity  model  [15]  in  that 
they  prohibit  reading  up  (i.e.,  the  user  cannot  read  untrusted  information)  and 
writing  down  (i.e.,  the  attacker  cannot  write  trusted  information). 
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3.1.1  Contamination  Experiment  Protocol 

Users  cannot  directly  observe  untrusted  inputs,  thus  users  are  uncertain  about 
them.  A  user's  belief  characterizes  this  uncertainty.  Note  that  it  is  now  the  user 
who  holds  beliefs — not  the  attacker,  who  held  beliefs  about  secret  inputs  in  the 
model  of  chapter  2.  Recall  (from  §2.1)  that  beliefs  are  held  about  program  states, 
which  map  variables  to  values.  Previously,  a  state  could  be  decomposed  into 
two  parts:  its  high  projection,  containing  just  the  secret  variables,  and  its  lozv  pro¬ 
jection,  containing  just  the  public  variables.  Now,  since  we  are  concerned  with 
integrity,  we  instead  decompose  a  state  into  a  trusted  projection,  containing  just 
the  trusted  variables,  and  an  untrusted  projection,  containing  just  the  untrusted 
variables.  The  trusted  projection  of  state  a  is  denoted  cr  \T;  and  the  untrusted 
projection,  a  [  U.  Probability  distributions,  hence  beliefs,  can  likewise  be  pro¬ 
jected.  Previously,  beliefs  were  probability  distributions  over  the  high  projection 
of  states;  now,  beliefs  are  probability  distributions  over  the  untrusted  projection 
of  states. 

A  contamination  experiment  describes  how  a  user  revises  his  beliefs  about  un¬ 
trusted  inputs.  During  an  experiment,  the  user  interacts  with  a  system  and 
observes  trusted  outputs.  The  protocol  for  contamination  experiments  is  given 
in  figure  3.2  and  is  explained  below.2 

Formally,  a  contamination  experiment  £  is  described  by  a  tuple, 

£  =  (S,  bu,  (Tu ,  aT ), 

where  S  is  the  program  executed  by  the  system,  au  is  the  untrusted  projection  of 
the  initial  state,  and  <rT  is  the  trusted  projection  of  the  initial  state.  For  simplicity, 

2This  protocol  is  essentially  identical  to  the  protocol  for  confidentiality  experiments  in  fig¬ 
ure  2.2.  The  changes  are  (i)  the  introduction  of  the  user  as  an  agent,  (ii)  the  reversal  of  the  roles 
of  the  user  and  attacker,  and  (iii)  the  substitution  of  "trusted"  for  "low"  and  "untrusted"  for 
"high." 
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A  contamination  experiment  S  =  (S,  bjj,  cru,  °t)  is  conducted  as  follows. 

1.  The  user  chooses  a  prebelief  bu  about  the  untrusted  state. 

2.  (a)  The  attacker  picks  an  untrusted  state  au . 

(b)  The  user  picks  a  trusted  state  aT. 

3.  The  user  predicts  the  output  distribution:  5'P  =  [5]((TT  0  bu). 

4.  The  system  executes  program  S,  producing  a  state  a'  e  0  as  output,  where 
0  =  |,S']  (ar  0  (T u ) .  The  user  observes  the  trusted  projection  of  the  output 
state:  o  =  a'  \T. 

5.  The  user  infers  a  postbelief:  b[r  =  (<5p|o)  \  U. 

Figure  3.2:  Contamination  experiment  protocol 

assume  that  S  always  terminates.3  Also  assume  that  the  attacker  and  user  know 
the  code  of  program  S. 

The  user's  prebelief  bUr  characterizing  his  uncertainty  about  untrusted  inputs 
at  the  beginning  of  the  experiment,  may  be  chosen  arbitrarily.4  The  attacker 
chooses  au,  the  untrusted  projection  of  the  initial  state,  and  the  user  chooses  aTr 
the  trusted  projection  of  the  initial  state.  Using  the  semantics  of  S  along  with 
prebelief  bv  as  a  distribution  on  untrusted  input,  the  user  conducts  a  "thought 
experiment"  to  generate  a  prediction  <)'P  of  the  output  distribution: 

S'p  =  0)  bu). 

Program  S  is  executed  by  the  system.  The  distribution  on  output  states  pro¬ 
duced  by  that  execution  is  5': 


5'  =  [S\(aT  0  av). 

3This  assumption  can  be  eliminated  by  using  the  technique  described  in  §2.2.4.  Also,  recall 
that  in  confidentiality  experiments  we  assumed  that  S  did  not  modify  any  of  the  secret  (high) 
projection  of  the  state,  because  the  initial  secret  values  needed  to  be  preserved  in  the  final  state. 
To  remove  this  restriction,  §2.2.4  described  a  technique  for  preserving  a  copy  of  the  untrusted 
component  of  the  state.  But  here,  we  have  already  introduced  an  alternate  solution — the  im¬ 
mutable  inputs  preserve  such  a  copy.  Thus  copying  is  not  needed  here. 

4As  with  confidentiality,  an  admissibility  restriction  (c.f.  §2.1.4)  can  rule  out  nonsensical  prebe¬ 
liefs. 
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The  user  makes  an  observation,  which  is  the  trusted  projection  of  an  output  state 
sampled  from  5'.  We  write  o'  e  S'  to  denote  that  o'  is  in  the  support  of  (i.e.,  has 
positive  frequency  according  to)  S'.  The  observation  o  resulting  from  o'  is 

o  =  o'\T. 

Finally,  the  user's  postbelief  b'jj  is  the  untrusted  projection  of  the  distribution 
that  results  from  conditioning  prediction  S'P  on  observation  o: 

bu  =  (^p|°)  t  U. 

Postbelief  b[j  characterizes  the  user's  uncertainty  about  the  untrusted  inputs  at 
the  end  of  the  experiment. 

з. 1.2  Contamination  Metric 

Define  the  amount  of  information  flow  Qcon  caused  by  outcome  b'L!  of  experi¬ 
ment  S  as  the  improvement  in  the  accuracy  of  the  user's  belief: 

Qcon^i^'u)  —  D(bjj  — »  &u)  ~  D{b'u  — »  &jj). 

Let  D  be  instantiated  with  relative  entropy  as  in  chapter  2.  Thus  the  unit  of 
measurement  for  Qcon  is  (information-theoretic)  bits. 

As  an  example  of  quantification  of  contamination,  consider  the  following 
program: 

t,2  tl  +  u 

Variables  tl  and  12  are  trusted,  whereas  variable  u  is  untrusted.  Suppose  that  tl 
and  u  are  one-bit  variables — that  is,  they  can  store  either  0  or  1 — but  that  12  can 
store  any  integer.  Let  the  user  have  a  uniform  prebelief  bu  about  the  value  of 

и.  Based  on  his  knowledge  of  tl,  the  user  will  correctly  infer  the  value  of  u  by 
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observing  t2.  For  example,  if  crT{tl)  =  0  and  (ruin)  =  1,  then  observation  o  will 
be  that  t2  =  1,  and  postbelief  b f  will  assign  state  ( u  i— >•  1)  probability  1.  Quantity 
of  flow  QCon  is  thus  1.  This  amount  is  intuitively  sensible:  one  bit  of  untrusted 
information,  the  value  of  u,  has  contaminated  the  trusted  output. 

More  generally,  we  can  show  that  Qcon  correctly  quantifies  the  infor¬ 
mation  contained  in  an  observation  o  about  untrusted  input  ajj.  Let 
Sy  =  f.S']  (ctt  0  crri)  \T  be  the  system's  distribution  on  trusted  outputs,  and  let 
Su  =  [.S']  (aT  0  bjj )  \T  be  the  user's  distribution  on  trusted  outputs.  As  in  §2.3.1, 
let  TfF)  denote  the  information  conveyed  by  event  F  drawn  from  probability 
distribution  5.  Then  T$v  (o)  quantifies  the  information  contained  in  o  about  both 
the  untrusted  inputs  and  the  probabilistic  choices  made  by  the  program,  but 
Tsy(o)  quantifies  only  the  information  about  the  probabilistic  choices.  And  Qcon 
quantifies  just  the  information  about  the  untrusted  inputs: 

Corollary  3.1.  Qcon(£,  b{f)  =  X5u(o )  -  ISy(o). 

Proof.  Identical  to  the  proof  of  theorem  2.2,  with  the  appropriate  textual  substi¬ 
tutions  for  agents  and  security  levels.  □ 

3.2  Quantification  of  Suppression 

We  now  model  a  sender  and  receiver,  who  communicate  through  a  program. 
The  receiver,  by  observing  the  program's  outputs,  attempts  to  determine  the 
sender's  inputs.  For  example,  the  sender  might  be  a  database,  and  the  program 
might  construct  a  web  page  using  queries  to  the  database;  the  receiver  attempts 
to  reconstruct  information  in  the  database  from  the  (incomplete)  information  in 
the  web  page.  As  another  example,  the  program  might  model  a  noisy  channel; 
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Figure  3.3:  Channels  in  suppression  experiment 

the  sender's  inputs  are  messages,  and  the  receiver  attempts  to  determine  what 
messages  were  sent. 

As  with  contamination,  the  program  receives  trusted  inputs  as  the  initial 
values  of  variables  and  produces  trusted  outputs  as  the  final  values  of  variables. 
The  sender  writes  the  initial  values  of  trusted  inputs,  and  the  receiver  reads 
the  final  values  of  trusted  outputs.  These  are  the  only  ways  that  either  agent 
may  access  any  variables.  We  continue  to  model  an  attacker,  who  can  attempt 
to  interfere  with  the  trusted  outputs.  The  attacker  writes  the  initial  values  of 
untrusted  inputs  and  may  also  read  the  final  values  of  untrusted  outputs.  The 
channels  between  agents  and  the  program  are  depicted  in  figure  3.3. 

3.2.1  Suppression  Experiment  Protocol 

Formally,  a  suppression  experiment  £  is  described  by  a  tuple, 

£  =  (S,  b,  av,  aT)t 

where  S  is  the  program,  b  is  the  receiver's  prebelief  about  trusted  and  untrusted 
inputs,  (Ju  is  the  untrusted  projection  of  the  initial  state,  and  aT  is  the  trusted 
projection  of  the  initial  state.  Note  that  the  receiver's  belief  concerns  the  entire 
initial  state  because  he  may  not  observe  any  inputs.  The  protocol  for  suppres¬ 
sion  experiments  is  given  in  figure  3.4.  In  the  protocol,  notation  a  \  TO  denotes 
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A  suppression  experiment  S  =  (S,  b ,  au,  crT )  is  conducted  as  follows. 

1.  The  receiver  chooses  a  prebelief  b  about  the  trusted  and  untrusted  state. 

2.  (a)  The  attacker  picks  an  untrusted  state  au- 
(b)  The  sender  picks  a  trusted  state  aT. 

3.  The  receiver  predicts  the  output  distribution:  S'R  =  IS]  b. 

4.  The  system  executes  program  S,  which  produces  a  state  o'  e  5'  as  output, 
where  5'  =  |,5'](VrT  <E>  au)-  The  receiver  observes  the  trusted  projection  of 
the  output  state:  o  =  a'  \  TO. 

5.  The  receiver  infers  a  postbelief:  b'  =  (<^|o). 


Figure  3.4:  Suppression  experiment  protocol 

projection  of  state  a  to  trusted  outputs.  The  protocol  is  a  straightforward  adap¬ 
tation  of  the  contamination  protocol  from  §3.1.1. 


3.2.2  Suppression  Metric 

Define  the  amount  of  information  flow  Qtrans — that  is,  the  amount  of 
transmission — caused  by  outcome  b'  of  experiment  £  as  the  improvement  in 
the  accuracy  of  the  receiver's  belief  about  trusted  inputs: 

Qtrans(£,  &')  =  D(b  f  TI  ->  &T)  ~  D{b'  \  TI  ->  0T) , 

where  notation  b  \  TI  denotes  projection  of  belief  b  to  trusted  inputs. 

Quantity  D(b  [  TI  — >  aT)  is  the  maximum  amount  of  information  the  re¬ 
ceiver  could  learn  about  trusted  inputs.  Quantity  Qtrans{£,  b')  is  the  amount  of 
information  the  receiver  actually  learned  about  trusted  inputs  from  outcome  b' . 
Thus,  quantity  D(b'  |  TI  — >  &t)  is  the  amount  of  information  the  receiver  failed 
to  learn  about  trusted  inputs,  meaning  that  it  quantifies  suppression.  Define 
S(£,  b')  to  be  that  quantity: 

S{£,b')  =  D(b'  [  TI  — 1>  aT). 
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As  an  example  of  quantification  of  suppression,  consider  the  following  pro¬ 
gram: 

o  :=  i  ©  rnd() 

Variables  i  and  o  are  one-bit  input  and  output  variables,  respectively.  Both  vari¬ 
ables  are  trusted.  Program  expression  rnd()  returns  a  uniformly  random  bit.  Let 
the  receiver  have  a  uniform  prebelief  b  about  the  value  of  i.  As  a  result  of  the 
suppression  experiment  protocol,  the  receiver  infers  a  postbelief  b'  about  i  that 
is  uniform,  thus  b  =  b' .  So  quantity  of  transmission  Qtrans  is  0  bits,  and  quantity 
of  suppression  S  is  1  bit.  These  quantities  are  intuitively  sensible:  the  receiver 
cannot  learn  anything  about  i  by  observing  o  because  of  the  bit  of  random  noise 
added  by  the  program. 

We  can  show  that  Qtrans  correctly  quantifies  the  information  about  trusted 
input  aT  contained  in  observation  o.  Let  SR  =  ([S'] b)  \  TO  be  the  receiver's 
distribution  on  trusted  outputs.  Suppose  that  the  sender  shares  the  receiver's 
belief  about  untrusted  inputs,  meaning  that  the  sender's  distribution  on  un¬ 
trusted  inputs  is  b\aT  when  the  trusted  input  is  aT/  and  let  dv  =  ([S]  (b\rrT))  \  TO 
be  the  sender's  distribution  on  trusted  outputs.5  Then  lsR(o)  quantifies  the  in¬ 
formation  contained  in  o  about  the  trusted  inputs,  untrusted  inputs,  and  the 
probabilistic  choices  made  by  the  program.  And  (o)  quantifies  only  the  in¬ 
formation  about  the  untrusted  inputs  and  the  probabilistic  choices.  Thus  Qtrans 
quantifies  just  the  information  about  the  trusted  inputs: 

Theorem  3.1.  Q«rons(£,  b')  =  T5r(o)  -  1Sy(o). 

Proof.  In  appendix  3.  A.  □ 

5Another  way  to  rationalize  distribution  dy  is  to  recognize  that  it  would  be  the  receiver's 
distribution  on  trusted  outputs  if  he  were  told  the  value  of  trusted  input  ut- 
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Furthermore,  a  result  similar  to  theorem  2.7  holds  for  Qtrans:  if  the  sender  and 
receiver  use  the  same  distribution  St  on  trusted  inputs,  then  expected  amount 
of  flow  E[<2tran5]  is  equal  to  the  mutual  information  between  trusted  inputs  Ttn 
and  trusted  outputs  Toutr  given  the  untrusted  inputs  Um.  The  expectation  is  with 
respect  to  observation  o  and  distribution  ST. 

Corollary  3.2.  Let  £  =  ( S ,  (6|  cr^/) ,  ST ,  vu),  where  h\TI  =  ST.  Then: 

E  [Qtrans(£)}  =  Z(Tin,  Tout\Uin) . 

Proof.  The  proof  is  essentially  identical  to  the  proof  of  theorem  2.7,  substituting 
"trusted"  for  "high"  and  "untrusted"  for  "low,"  T  for  H  and  U  for  L,  etc.  □ 

If  program  S  does  not  mention  any  untrusted  inputs,  the  conditioning  on 
untrusted  input  au  can  be  eliminated  from  corollary  3.2.  In  this  case,  the  ex¬ 
pected  amount  of  flow  is  simply  the  mutual  information  between  the  trusted 
inputs  and  the  trusted  outputs.  This  coincides  with  the  standard  information- 
theoretic  model  of  a  communication  channel  [32],  in  which  there  are  no  un¬ 
trusted  inputs — suppression  occurs  only  when  random  errors  are  introduced 
by  the  channel  itself. 

Finally,  as  an  example  of  quantifying  both  contamination  and  suppression, 
consider  this  program: 

o  :=  i  ©  u 

Recall  that  i  and  o  are  one-bit,  trusted  input  and  output  variables,  and  that  u  is  a 
one-bit,  untrusted  input  variable.  Suppose  the  receiver  has  a  uniform  prebelief 
about  inputs  i  and  u.  Then  the  quantity  of  suppression  is  1  bit  because  the 
receiver  cannot  learn  anything  about  i.  Likewise,  if  we  treat  the  receiver  as  a 
user — allowing  him  to  observe  i  and  o — then  the  quantity  of  contamination  is  1 
bit  because  he  learns  everything  about  u. 
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3.2.3  Attacker-controlled  Suppression 


Sometimes  the  attacker  can  control  how  much  suppression  occurs.  For  example, 
consider  the  following  program: 

o  :=  i  +  u 


Assume  that  inputs  i  and  u  are  integers  in  the  interval  [1,  M]  for  some  M  >  1. 
Output  o  is  therefore  an  integer  in  [2,  2 M],  If  the  receiver  observes  that  o  is  2, 
the  receiver  can  infer  that  u  —  i  —  1.  Hence  the  attacker,  by  choosing  u  —  1, 
can  make  it  possible  that  no  information  about  i  is  suppressed — though  not 
necessary,  because  i  might  be  set  to  some  integer  other  than  1.  But  if  the  attacker 
sets  u  to  M ,  no  matter  what  value  of  o  the  receiver  observes,  all  values  of  i  are 
still  possible.  Hence  the  attacker,  by  choosing  u  =  M,  can  make  it  possible  that 
all  information  about  i  is  suppressed.  We  now  formalize  this  intuition. 

Define  the  quantity  of  attacker-controlled  suppression  SA  for  a  program  S,  re¬ 
ceiver  prebelief  b,  and  trusted  input  aT  as  follows: 


SA(S,b,crT)  =  max  S((S,b,au,aT),b') 

'TU,b'eB{(S,b,au,<TT)) 


min 

an  ,b' &B((S,b,ojj 


S((S,b,au,aT), 


This  quantity  is  the  difference  between  the  maximum  and  the  minimum  amount 
of  suppression  possible  over  any  choice  of  inputs  av  made  by  the  attacker.  For 
the  program  above  with  a  uniform  receiver  prebelief,  the  quantity  of  attacker- 
controlled  suppression  is  lg  M  bits.  This  is  intuitively  sensible,  because  the  at¬ 
tacker  can  control  whether  it  is  possible  for  the  receiver  to  learn  everything  or 
nothing  about  i. 

Consider  this  revision  of  the  program  we  have  been  considering: 


o  :=  il  +  i2  +  u 
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Assume  that  i  1  and  i2  are  integers  in  the  interval  [1,  M].  If  the  receiver  observes 
that  o  =  3,  the  receiver  can  again  infer  the  exact  values  of  il,  i2,  and  u.  So  the 
attacker  can  again  make  it  possible  that  no  information  is  suppressed.  But  if  the 
attacker  sets  u  to  M,  then  the  receiver  will  observe  that  o  is  in  [M  +  2, 3 M],  Note 
that  this  allows  the  receiver  to  eliminate  some  possibilities  for  the  input  values, 
since  they  cannot  sum  to  less  than  M  +  2.  Hence  if  the  receiver's  prebelief  on  the 
inputs  is  uniform,  his  postbelief  will  not  be  uniform,  meaning  that  he  learned 
information  about  the  inputs  and  that  some  information  was  not  suppressed. 
For  example,  suppose  that  the  receiver  observes  that  o  =  M  +  2.  There  are 
(A/+i)  _  wayS  choose  input  values  that  sum  to  M  +  2.  Each  of  these 

will  be  equally  likely,  so  the  postbelief  will  assign  each  probability  M^I+l)  ■  (But 
the  remaining  ways  to  choose  inputs — those  that  do  not  sum  to  M +2 — will  have 
probability  0,  establishing  that  this  distribution  is  not  uniform.)  The  amount  of 
suppression  is  therefore  lg  MA±ii,  which  is  always  less  than  the  total  amount  of 
information  the  receiver  could  have  learned,  IgM2.  This  is  intuitively  sensible, 
because  the  attacker  can  no  longer  suppress  all  the  information  about  trusted 
inputs  il  and  i‘2. 

3.2.4  Program  Suppression 

Consider  the  following  program: 

if  u  then  i2  :=  il  else  i2  :=  il  ©  rnd() 

Assume  that  il  is  a  2-bit  input  variable  and  i2  is  a  2-bit  output  variable.  If  the 
attacker  sets  u  to  true,  then  i2  equals  il  and  is  no  information  is  suppressed.  But 
if  the  attacker  sets  u  to  false,  all  information  about  il  is  suppressed.  It  would  be 
useful  to  quantify  the  amount  of  suppression  that  the  attacker  directly  controls. 
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versus  the  amount  that  is  intrinsic  in  the  program  itself.  The  metric  for  attacker- 
controlled  suppression  did  not  make  this  distinction. 

Toward  that  goal,  define  the  quantity  of  program  suppression  Sp  as  follows: 

SP(S,b')  4  D((b'\au)\TI  ^aT). 


This  definition  differs  from  the  definition  of  suppression  S  only  by  condition¬ 
ing  receiver  postbelief  /:/  on  untrusted  input  arj.  This  conditioning  yields  the 
receiver's  postbelief  were  he  told  the  attacker's  untrusted  inputs.  Any  remain¬ 
ing  suppression  must  come  solely  from  the  program. 

Define  the  quantity  of  attacker-controlled  program  suppression  Spa  for  a  pro¬ 
gram  S,  receiver  prebelief  b,  and  trusted  input  aT  as  follows: 


SPA(S,b,aT )  4  max  Sp((S,b,au,aT)  ,b') 

cT(7,6,eB((S,fe,(T(7,(TT)) 


min  SP({S,  b,  av,  aT),  b') 
au,b'eB((S,b,au,<TT)) 


This  quantity  is  the  difference  between  the  maximum  and  the  minimum  amount 
of  program  suppression  possible  over  any  choice  of  inputs  av  made  by  the  at¬ 
tacker.  For  the  program  above  with  a  uniform  receiver  prebelief,  the  quantity 
of  attacker-controlled  program  suppression  is  2  bits.  This  is  intuitively  sensible, 
because  the  attacker  controls  whether  il  is  completely  suppressed. 


3.3  Error-Correcting  Codes 

An  error-correcting  code  adds  redundant  information  to  a  message  so  that  sup¬ 
pression  can  be  detected  and  corrected.  One  of  the  simplest  error-correcting 
codes  is  the  repetition  code  Rn  [4],  which  adds  redundancy  by  repeating  a  mes¬ 
sage  n  times  to  form  a  code-word.  For  example,  /A,  would  encode  message  1  as 
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code-word  111.  The  code-word  is  sent  over  a  noisy  channel,  which  might  cor¬ 
rupt  the  code-word;  the  receiver  receives  this  possibly  corrupted  word  from  the 
channel.  For  example,  the  sender  might  send  code-word  111  yet  the  receiver 
could  receive  word  101.  To  decode  the  received  word,  the  receiver  can  employ 
nearest-neighbor  decoding:  the  nearest  neighbor  of  a  word  w  is  the6  code-word  c 
that  is  closest  to  w  by  the  Hamming  distance  metric  d.  Treating  words  as  vectors 
of  symbols,  Hamming  distance  d( w,  x)  between  words  w  and  x  is  the  number  of 
positions  i  at  which  uy  ^  xt.  For  the  repetition  code,  nearest-neighbor  decoding 
is  a  majority  vote:  a  word  is  decoded  to  the  symbol  that  occurs  most  frequently 
in  the  word.  For  example,  word  101  would  be  decoded  to  code-word  111,  thus 
to  message  1,  but  001  would  be  decoded  to  message  0. 

Consider  the  following  program,  which  models  the  binary  symmetric  channel 
often  studied  in  information  theory:7 

BSC  :  i:=  1; 

while  i  <  n  do 

Vi  tj,  p\\  Vi  :=  not  tp 
i  :=  i  +  1 

BSC  takes  as  trusted  input  an  n-bit  variable  t,  and  outputs  n-bit  trusted  variable 
v.  Each  bit  of  the  input  has  probability  1  —  p  of  being  flipped  in  the  output. 

If  n  —  1  and  the  receiver  has  a  uniform  prebelief  on  trusted  input  t,  then 
after  executing  BSC  and  observing  v,  the  receiver's  postbelief  b'  ascribes  proba¬ 
bility  p  to  an  input  t  such  that  t  =  v.  The  amount  of  program  suppression  Sp  is 
thus  —  lg p.  But  suppose  that  the  sender  and  receiver  employ  repetition  code  f?3 
with  program  BSC:  the  sender  encodes  a  one-bit  input  s  into  three  bits  ti,t2,  t3 

6The  nearest  neighbor  is  not  necessarily  unique  for  some  codes,  in  which  case  an  arbitrary 
nearest  neighbor  is  chosen. 

7Recall  from  chapter  2  that  probabilistic  choice  Si  p[]  S2,  where  0  <  p  <  1,  executes  program 
Si  with  probability  p  or  S2  with  probability  1  —  p. 
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Figure  3.5:  Model  of  anonymizer 

(so  n  =  3),  inputs  those  bits  to  BSC,  then  the  receiver  gets  three  bits  V] ,  v2,  v:i  as 
output,  and  decodes  them  to  one  bit  r.  Let  this  composed  program  be  RJBSC). 
Assuming  for  simplicity  that  the  receiver  has  a  uniform  prebelief,  postbelief  b' 
ascribes  probability  p3  +  ‘ip2  ( 1  —  p )  to  actual  input  s.8  The  amount  of  program 
suppression  SP  is  thus  —  lg(p3  +  3p2(l  —  p)).  So  for  any  p  >1/2  (i.e.,  for  any  chan¬ 
nel  at  least  slightly  biased  toward  correct  transmission),  the  program  suppres¬ 
sion  from  R3(BSC )  is  less  than  the  program  suppression  from  BSC.  Repetition 
code  R3  thus  corrects  program  suppression. 


3.4  Statistical  Databases 


The  introduction  to  this  chapter  suggested  that  mechanisms  used  by  statistical 
databases  to  create  anonymized  responses  to  queries  can  be  characterized  as 
sacrificing  integrity  to  improve  confidentiality.  We  can  now  make  this  charac¬ 
terization  precise  by  using  our  models. 

As  depicted  in  figure  3.5,  we  model  the  anonymizer  with  a  program  that 
receives  two  inputs.  The  first  input  is  the  user's  query,  which  contains  pub- 

8This  probability  can  be  derived  either  by  evaluating  the  program  semantics  directly,  or  by 
the  following  argument.  Decoded  output  r  equals  input  s  if  exactly  zero  or  one  bits  in  code 
word  t it 2^3  are  flipped  during  transmission.  Each  bit  f,;  is  transmitted  correctly  with  probability 
p  and  flipped  with  probability  1  -  p.  The  probability  that  zero  bits  are  flipped  is  thus  p3;  the 
probability  that  a  particular  bit  /,  is  flipped  is  p2(l  -  p);  and  there  are  three  possible  single  bits 
that  could  be  flipped.  So  the  total  probability  of  correct  decoding  is  p3  +  op2  ( 1  —  p). 
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lie  information.  The  second  input  is  a  response  containing  secret  informa¬ 
tion  from  the  database — perhaps  even  the  entire  contents  of  the  database.  The 
anonymizer  produces  an  anonymized  response  as  public  output.9  The  user  is  an 
attacker  against  confidentiality,  because  he  might  be  attempting  to  learn  secret 
information  through  his  query.  Since  the  model  we  have  just  described  coin¬ 
cides  with  our  model  for  quantitative  confidentiality,  it  is  straightforward  to  an¬ 
alyze  the  amount  of  information  leaked  by  the  anonymizer  using  the  techniques 
in  §2.3.  In  particular,  metric  Q  is  the  quantity  of  leakage. 

But  the  anonymizer  also  acts  as  a  noisy  communication  channel,  where  the 
database  is  the  sender  and  the  user  is  the  receiver.  The  input  to  this  chan¬ 
nel  is  trusted  input  from  the  database,  and  the  output  from  the  channel  is  the 
trusted,  anonymized  response  to  the  user.  The  query  input  from  the  user  could 
be  deemed  untrusted,  but  the  user  is  not  an  attacker  against  integrity  because 
he  does  not  attempt  to  reduce  the  amount  of  information  he  learns  through  the 
channel — indeed,  he  would  prefer  to  increase  the  amount.  So  although  there 
is  no  attacker-controlled  suppression,  the  anonymizer  causes  program  suppres¬ 
sion  as  quantified  by  Sp. 

We  can  relate  the  quantity  of  leakage  to  the  amount  of  program  suppression. 
Let  A  be  the  anonymizer  program,  b  be  the  user's  prebelief  about  the  database, 
d be  the  actual  database  contents,  q  be  the  user's  query,  and  b'  be  the  user's  post¬ 
belief  after  observing  the  anonymized  response.  Then  we  obtain  the  following 
theorem: 

9The  anonymizer  might  also  produce  some  output  about  the  anonymization  it  just  per¬ 
formed,  and  this  output  might  be  stored  in  the  database  and  used  during  future  anonymiza¬ 
tions.  This  output  would  be  secret;  we  do  not  model  it  here. 
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Figure  3.6:  Information  flows  in  a  system.  Dashed  lines  are  uninteresting  from 
our  security  perspective. 

Theorem  3.2.  D(b  ->  d)  —  <2((A,  b,  d,  q),  b ')  +  Sp((A,  b,  d,  q),  b'). 

Proof.  In  appendix  3.  A.  □ 

This  theorem  means  that  the  quantity  of  leakage  plus  the  quantity  of  pro¬ 
gram  suppression  is  constant  for  a  given  experiment  and  outcome.  That  con¬ 
stant  is  inaccuracy  D(b  ->  d )  in  the  user's  prebelief  b  about  the  database  con¬ 
tents  d.  This  is  intuitively  sensible,  because  D{b  ->  d)  is  the  total  amount  of 
information  the  user  could  possibly  learn  about  the  database  contents.  All  of 
that  information  is  either  communicated  to  the  user  (quantity  Q{{A,  b,  d,  q),  //)) 
or  suppressed  (quantity  Sp((A,  b,  d,  q),b')). 

3.5  Duality  of  Integrity  and  Confidentiality 

Consider  a  program  that  processes  two  levels  of  information,  low  and  high, 
denoted  L  and  H.  We  take  as  the  defining  characteristic  of  low  information 
that  its  use  be  unrestricted  in  the  program.  For  confidentiality,  low  is  therefore 
synonymous  with  public,  and  for  integrity,  low  is  synonymous  with  trusted. 
Analogously,  we  take  as  the  defining  characteristic  of  high  information  that  its 
use  be  restricted  in  the  program.  So  for  confidentiality,  high  is  synonymous  with 
secret,  and  for  integrity,  high  is  synonymous  with  untrusted. 
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Figure  3.7:  Dualities  between  integrity  (I)  and  confidentiality  (C) 

Let  HI  denote  the  high  inputs  to  the  system;  LO,  the  low  outputs;  etc.  As  de¬ 
picted  in  figure  3.6,  there  are  four  information  flows  between  inputs  and  outputs 
in  this  system:  LI  — >  LO,  HI  — >  LO,  LI  — >  HO,  and  HI  — ■>  HO.  The  two  flows 
to  HO  are  uninteresting  from  our  security  perspective  because  high  outputs  do 
not  need  to  be  protected — that  is,  for  confidentiality,  it  does  not  matter  what 
information  flows  to  secret  outputs;  and  for  integrity,  it  does  not  matter  what 
information  flows  to  untrusted  outputs.  However,  the  remaining  two  flows  to 
LO  are  interesting  and  exhibit  dualities,  which  are  summarized  in  figure  3.7  and 
discussed  below. 

Flow  HI  — >  LO  is  the  standard  problem  with  which  information-flow  se¬ 
curity  has  been  concerned.  For  confidentiality,  this  is  the  flow,  or  leakage,  from 
secret  inputs  to  public  outputs;  §2.2  and  §2.3  presented  our  framework  for  quan¬ 
tification  of  leakage.  For  integrity,  this  is  the  flow  from  untrusted  inputs  to 
trusted  outputs;  this  flow  was  named  contamination  in  §3.1.  Contamination  of 
trusted  information  is  therefore  the  information-flow  dual  of  leakage  of  secret 
information:  both  quantify  how  much  information  flows  between  inputs  and 
outputs  at  different  security  levels.  Indeed,  our  framework  for  quantification 
of  contamination  was  nearly  the  same  as  our  framework  for  quantification  of 
leakage.  We  needed  to  introduce  a  new  agent,  the  user,  in  the  integrity  model. 
But  the  user  could  have  been  included  in  the  confidentiality  model;  the  user's 
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role  there  would  have  been  to  choose  secret  inputs.  Note  that  the  user  and  at¬ 
tacker  reverse  roles  in  the  two  models:  for  confidentiality,  the  attacker  holds 
belief  about  high  (secret)  inputs,  and  for  integrity,  the  user  holds  belief  about 
high  (untrusted)  inputs. 

Define  attenuation  as  the  amount  of  information  that  does  not  flow  from  an 
input  to  an  output.  The  amount  of  actual  flow  of  information  is  therefore  the 
amount  of  information  that  could  possibly  flow  less  the  amount  of  attenuation. 
For  confidentiality,  the  attenuation  of  HI  — >  LO  is  the  distance  D{b'H  ->  &H) 
from  attacker's  postbelief  b'H  to  state  an-  This  distance  is  the  amount  of  secret 
information  that  is  not  leaked  to  the  attacker;  we  could  call  this  attenuation 
hiding  of  information.  Dually,  for  integrity,  distance  D(b'jj  — >  &u)  is  the  amount 
of  untrusted  information  that  does  contaminate  the  trusted  outputs;  we  could 
call  this  attenuation  hygiene  because  it  preserves  the  "cleanliness"  of  the  trusted 
outputs. 

Flow  LI  — >  LO  can  be  understood  as  one  of  the  standard  problems  with 
which  classical  information  theory  is  concerned.  For  integrity,  this  is  the  flow,  or 
transmission,  from  trusted  inputs  to  trusted  outputs;  our  framework  for  quan¬ 
tifying  the  flow  and  its  attenuation,  which  we  named  suppression,  was  given  in 
§3.2.  For  confidentiality,  this  flow  is  uninteresting:  the  amount  of  information 
that  flows  from  public  inputs  to  public  outputs  does  not  characterize  how  the 
program  leaks  or  hides  secret  information.  So  there  does  not  seem  to  be  a  dual 
to  this  flow. 
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3.6  Related  Work 


Newsome,  Song,  and  McCamant  [94]  quantify  the  amount  of  influence  an  at¬ 
tacker  can  exert  over  the  execution  of  a  program  as  the  logarithm  of  the  size 
of  the  set  of  possible  outputs.  Assuming  that  programs  are  deterministic  and 
that  all  inputs  are  either  under  the  control  of  the  attacker  or  are  fixed  constants, 
this  quantity  is  the  channel  capacity  of  the  program.  Our  definition  of  con¬ 
tamination  generalizes  this  definition  by  allowing  probabilistic  programs  and 
trusted  inputs  that  are  not  under  the  control  of  the  attacker.  Also,  their  defini¬ 
tion  conservatively  assumes  a  uniform  distribution  over  outputs,  but  the  defi¬ 
nitions  given  here  allow  arbitrary  distributions  over  inputs  and  outputs.  How¬ 
ever,  they  implement  a  dynamic  analysis  that  automatically  quantifies  influence 
in  real-world  programs. 

Kifer  and  Gehrke  [63]  quantify  the  utility  of  anonymized  data  with  relative 
entropy  (there  called  Kullback-Leibler  divergence).  They  use  this  metric  to  se¬ 
lect  among  different  anonymizations  of  a  dataset. 

Biba  [15]  first  identified  a  duality  between  confidentiality  and  integrity,  mod¬ 
eling  integrity  with  a  dual  of  the  Bell-LaPadula  model  of  confidentiality.  Similar 
dualities  have  been  exploited  in  Flume  [67]  and  recent  versions  of  Jif  [22], 

Clark  and  Wilson  [26]  propose  a  different  kind  of  integrity  policy,  suitable 
for  commercial  organizations,  based  on  well-formed  transactions  and  verifica¬ 
tion  procedures.  We  have  not  investigated  quantitative  generalizations  of  this 
policy. 
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3.7  Summary 


This  chapter  presents  an  information-flow  model  for  quantification  of  integrity. 
We  introduced  two  novel  information-flow  integrity  metrics,  contamination 
and  suppression.  Both  metrics  are  defined  by  adapting  our  belief-based  model 
(in  chapter  2)  for  quantification  of  confidentiality.  We  have  shown  that  our 
metric  for  suppression  agrees  with  the  classical  information-theoretic  metric  for 
channel  capacity.  We  have  also  applied  our  definition  to  the  analysis  of  error- 
correcting  codes  and  statistical  databases. 
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3.A  Appendix:  Proofs 


Theorem  3.1  Qtrans(£ ,  b ')  =  1Sr{o )  -  1Sy(o). 

Proof 

Qtrans(£,b') 

=  (  Definition  of  Qtrans  ) 

D(b  \  TI  t>  or)  -  D(6'  f  TI  -h>  <jT) 

=  (  Definitions  of  D  and  point  mass  ) 

-  lg(6  r  TI){(tt)  +  lg(6'  r  r/)((TT) 

=  (  Lemma  3.1  (below),  properties  of  lg  ) 

-lgpr<5fl(°)  +  lgPr<5v(o) 

=  (  Definition  of  J  ) 

^<5fl(0)  -ZsY{o) 

□ 

Lemma  3.1.  (6'  [  77)(<tt)  =  (b \  TI)(ot)  ■  gg  . 

Proof. 

(V\TI)(<tt) 

=  (  Definition  of  b'  in  corruption  experiment  protocol ) 

(((I5]6)|0)rr/)M 
=  (  Definition  of  S  [  TI  ) 

(E  °  :  *  \  TI  =  :  (([5]6)|o)(<r)) 
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(  Definition  of  <5|o  ) 

=  ^  A -rro  =  o: 

(  Definition  of  <5  \  T  ) 

ry'  a  ■  a  \ 'T  —  (cT'r  U  o )  •  (I‘^l^)(cr)  \ 

{^a  .  a\l  -  {aTUo)  .  ((|5j6)  [■  r0)(0) j 

(  Distributivity  ) 

((PPJTTOM  '(Eg  :  0‘fr  =  (^u°) :  (ISWW) 

(  Definition  of  d  a  ) 

'  (E  ^  :  =  (aT  U  o)  :  ([S]6)(cr)) 

(  Definition  of  {S}8  ) 

■  (E  \T  =  Kuo)  :  (E  O'  :  6(<r')  ■  (ISK)(«r))) 

( Input  is  immutable,  so  a  and  a'  must  agree  on  it ) 

^L-(£^rr  =  Kuo) : 

(E  cr'  ■■  cr'  \  TI  =  crT  :  b(<f)  •  ([5]u')(a))) 

(  Associativity  ) 

«r'  :  o'  I  r/  =  «tt  : 

(Eff  :  fffr  =  ((7TUo)  :  6(a')  •  ([5]u')((t))) 

(  Distributivity  ) 

^^tr/  =  «TT:(,K) 

■  (E  o'  :  °  fT  =  (>t  U  o)  :  ([5]a')(u))) 

(  Unit  of  ■  ) 
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1 


(b\TI)(aT) 
{b  \  TI)(<jt) 


■  (E  a'  :  a'  \  TI  —  crT  :  b(a') 

■  (E  a  :  a  l T  =  (aT  u  o)  :  (E*S']cr/)(cr))) 


(  Distributivity  ) 

SM  '  {b  1  TI)(ut)  =  {bl%\aT) 

■  (E  °  a  u  o)  :  ([S]<x')(cx))) 

(  Definition  of  b  \  IJ,  using  range  of  a'  ) 

■  ( b  \  TI)(crT)  ■£(/  :  a'  \  TI  =  aT  :  (■ b\aT){a ') 

•  (J]  n  :  n  f  T  =  (aT  U  o)  :  ([S]a')(u))) 


(  Distributivity  ) 

3^-(!>rr/)(<7T).(£  </  :  ff'  r  TI  =  <rT  : 

(E  ^  :  ^  =  Et  U  o)  :  (6|aT)((T/)  •  ([S]a')E))) 

(  Associativity  ) 

j-Ey  •  (6 1  r/)(uT)  •  (E  ;  o-  tr  =  Et  u  6) 

(E  0-'  :  a'  \  TI  =  aT  :  (. b\aT)(cr ')  •  ([SV)E))) 

( Input  is  immutable,  so  a  and  a'  must  agree  on  it ) 

J-E-  .  ( b  \  TI)(<tt)  ■  a  :  a\T  =  (crTUo)  : 

(E  o'  :  (b\aT)(a')  ■  (lS}a')(a))) 

(  Definition  of  [S']  5  ) 

^-Ey  •  ( b  \  TI)(<jt)  ■  (E  :  \T  =  Et  U  o)  :  ([S](&|<7t))E)) 

(  Definition  of  b\U  ) 
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•  ( b  \  TI)(<jt )  •  (E  ^  :  ^ \T0  =  °  '■  ([51(6|^t))(o-)) 


(  Definition  of  <5  \  TO  ) 

■  (6  r  TI)(ot)  ■  ((IS](6kr))  t  TO)(o) 

(  Definition  of  5s  ) 

j^-(b\TI)(aT)-Ss(o) 

(  Commutativity  ) 

□ 
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CHAPTER  4 


FORMALIZATION  OF  SECURITY  POLICIES* 

The  Trusted  Computer  System  Evaluation  Criteria  (TCSEC)  [37],  also  known 
as  the  "Orange  Book,"  establishes  verified  design  as  the  highest  security  certifica¬ 
tion  that  a  computer  system  can  obtain.1  To  meet  (in  part)  the  criteria  for  verified 
design,  a  system  must  be  accompanied  by  a  formal  security  policy,  a  formal  de¬ 
sign,  and  a  formal  or  informal  proof  that  the  design  satisfies  the  policy  Some 
lower  levels  of  certification  also  require  the  statement  of  a  formal  security  pol¬ 
icy.2  So  formal  techniques  for  specification  of  security  policies  are  necessary  to 
achieve  high  levels  of  certification. 

Recall  from  chapter  1  that  the  theory  of  trace  properties  seems  appealing  as 
a  formal  technique  for  specification  of  security  policies,  but  that  security  poli¬ 
cies  such  as  noninterference  and  mean  response  time  are  not  trace  properties. 
Sets  of  trace  properties,  however,  are  sufficient  to  formalize  security  policies.  In 
chapter  1,  we  named  these  sets  hyperproperties.  A  theory  of  hyperproperties  is 
developed  in  this  chapter.  We  generalize  safety  and  liveness,  and  their  topo¬ 
logical  characterizations,  from  trace  properties  to  hyperproperties.  We  identify 
a  subclass  of  hypersafety,  called  k-safety,  for  which  we  give  a  relatively  com¬ 
plete  verification  methodology.  And  we  show  that  every  hyperproperty  is  the 
intersection  of  a  safety  hyperproperty  and  a  liveness  hyperproperty. 

This  chapter  proceeds  as  follows.  Hyperproperties,  hypersafety,  k- safety, 
and  hyperliveness  are  defined  and  explored  in  §4.1,  §4.2,  §4.3,  and  §4.4,  respec¬ 
tively.  A  topological  account  of  hyperproperties  is  given  in  §4.5.  The  hyperpro- 

*This  chapter  contains  material  from  a  previously  published  paper  [29],  which  is  ©  2008 
IEEE  and  reprinted,  with  permission,  from  Proceedings  of  the  21st  IEEE  Computer  Security  Foun¬ 
dations  Symposium. 

Aerified  design  is  designated  "Class  At"  by  the  TCSEC. 

2These  certifications  are  structured  protection  and  security  domains,  designated  "Class  B2"  and 
"Class  B3"  by  the  TCSEC. 
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perty  intersection  theorem  is  presented  in  §4.6,  and  §4.7  concludes.  Most  proofs 
are  delayed  from  the  main  body  to  appendix  4.  A. 

4.1  Hyperproperties 

We  model  system  execution  with  traces,  where  a  trace  is  a  sequence  of  states;  by 
employing  rich  enough  notions  of  state,  this  model  can  encode  other  represen¬ 
tations  of  execution.3 

The  structure  of  a  state  is  not  important  in  the  following  definitions,  so  we 
leave  set  E  of  states  abstract.  However,  the  structure  of  a  state  is  important 
for  real  examples,  and  we  introduce  predicates  and  functions,  on  states  and  on 
traces,  as  needed  to  model  events,  timing,  probability,  etc. 

Traces  may  be  finite  or  infinite  sequences,  which  we  categorize  into  sets: 

tffin  =  £*, 

*inf  =  E", 

T  =  TfinUTinf, 

where  E*  denotes  the  set  of  all  finite  sequences  over  E,  and  Ew  denotes  the  set 
of  all  infinite  sequences  over  E.  For  trace  t  =  so^i . . .  and  index  i  £  N,  we  define 
the  following  indexing  notation: 

t[i\  =  sh 

—  505i  .  .  .  5^, 

t[i» .]  —  SiSi- |_i . . . 

3Chapter  5  shows  how  to  model  a  labeled  transition  system  as  a  set  of  traces  by  including 
transition  labels  in  states,  thereby  preserving  information  about  the  nondeterministic  branching 
structure  of  the  system.  This  encoding  is  also  used  by  chapter  5  to  model  state  machines  and 
probabilistic  systems. 
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We  denote  concatenation  of  finite  trace  t  and  (finite  or  infinite)  trace  t'  as  tt' ,  and 
we  denote  the  empty  trace  as  e. 

A  system  is  modeled  by  a  non-empty  set  of  infinite  traces,  called  its  executions. 
If  an  execution  terminates  (and  thus  could  be  represented  by  a  finite  trace),  we 
represent  it  as  an  infinite  trace  by  infinitely  stuttering  the  final  state  in  the  finite 
trace. 

4.1.1  Trace  Properties 

A  trace  property  is  a  set  of  infinite  traces  [5,70].  The  set  of  all  trace  properties  is 

Prop  =  'P(Tinf), 

where  V  denotes  powerset.  A  set  T  of  traces  satisfies  a  trace  property  P ,  denoted 
T  f=  P,  iff  all  the  traces  of  T  are  in  P : 

T  |=  P  =  TCP. 

Some  security  policies  are  expressible  as  trace  properties.  For  example,  con¬ 
sider  the  policy  "The  system  may  not  write  to  the  network  after  reading  from  a 
file."  Formally,  this  is  the  set  of  traces 

NRW  =  {t  E  Tinf  |  -i(3  i,j  eN  :  i  <  j  A  isFileRead(t[i\) 

A  isNetworkWrite(t\j]))},  (4.1.1) 
where  isFileRead  and  isNetworkWrite  are  state  predicates. 

Similarly,  access  control  is  a  trace  property  requiring  every  operation  to  be 
consistent  with  its  requestor's  rights: 

AC  =  {t  e  Tjnf  |  (Vi  G  N  :  rights Req(t [i]) 

C  acm(t[i  -  l])[subj(t\i]),  obj(t\i])])}.  (4.1.2) 
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Function  acm(s)  yields  the  access  control  matrix  in  state  s.  Function  subj(s) 
yields  the  subject  who  requested  the  operation  that  led  to  state  s,  function  obj  (s) 
yields  the  object  involved  in  that  operation,  and  function  rights Req(s )  yields  the 
rights  required  for  the  operation  to  be  allowed. 

As  another  example,  guaranteed  service  is  a  trace  property  requiring  that  ev¬ 
ery  request  for  service  is  eventually  satisfied: 

GS  =  {t  E  \F jnf  |  (Vi  E  N  :  isReq(t[i]) 

= =>-  (3  j  >i  :  isRespToReq(t\j],t[i\)))}.  (4.1.3) 

Predicate  isReq(s)  identifies  whether  a  request  is  initiated  in  state  s,  and  predi¬ 
cate  isRespToReq(s s)  identifies  whether  state  s'  completes  the  response  to  the 
request  initiated  in  state  s. 

4.1.2  Hyperproperties 

A  hyperproperty  is  a  set  of  sets  of  infinite  traces,  or  equivalently  a  set  of  trace 
properties.  The  set  of  all  hyperproperties  is 

HP  ^  V{V{%rf)) 

=  P(Prop). 

The  interpretation  of  a  hyperproperty  as  a  security  policy  is  that  the  hyperpro¬ 
perty  is  the  set  of  systems  allowed  by  that  policy.4  Each  trace  property  in  a 
hyperproperty  is  an  allowed  system,  specifying  exactly  which  executions  must 
be  possible  for  that  system.  Thus  a  set  T  of  traces  satisfies  hyperproperty  H, 
denoted  T  \—  H,  iff  T  is  in  H: 

T\=H  =  T  eH. 

4The  hyperproperty  might  also  contain  the  empty  set  of  traces,  although  this  set  does  not 
correspond  to  a  system. 
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Note  the  use  of  bold  face  to  denote  hyperproperties  (e.g.,  H )  and  sans  serif 
to  denote  sets  of  trace  properties  (e.g..  Prop).  Although  a  hyperproperty  and  a 
set  of  trace  properties  are  mathematically  the  same  kind  of  object  (a  set  of  sets 
of  traces),  they  are  used  differently  in  formulas,  hence  the  different  typography. 
Sets  of  hyperproperties  are  simultaneously  bold  face  and  sans  serif  (e.g.,  HP). 

Given  a  trace  property  P,  there  is  a  unique  hyperproperty  denoted  [P]  that 
expresses  the  same  policy  as  P.  We  call  this  hyperproperty  the  lift  of  P.  For  P 
and  [P]  to  express  the  same  policy,  they  must  be  satisfied  by  the  same  sets  of 
traces.  Thus  we  can  derive  a  definition  of  [P] : 

(VP  G  Prop  :  T\=  P  <=*>  T  |=  [P]) 

=  (VPG  Prop:  TCP  <*=*►  T  G  [P] ) 

=  [P]  ={TG  Prop  (TCP) 

=  [P]=V(P). 

Consequently,  the  lift  of  P  is  the  powerset  of  P: 

[P]  =  V(P). 

4.1.3  Hyperproperties  in  Action 

Trace  properties  are  satisfied  by  traces,  whereas  hyperproperties  are  satisfied 
by  sets  of  traces.  This  additional  level  of  sets  means  that  hyperproperties  can  be 
more  expressive  than  trace  properties.  We  explore  this  added  expressivity  with 
some  examples. 

Secure  information  flow.  Information-flow  security  policies  express  restric¬ 
tions  on  what  information  may  be  learned  by  users  of  a  system.  Users  interact 
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with  systems  by  providing  inputs  and  observing  outputs.  To  model  this  interac¬ 
tion,  define  ev(s)  as  the  input  or  output  event,  if  any,  that  occurs  when  a  system 
transitions  to  state  s.  Assume  that  at  most  one  event,  input  or  output,  can  occur 
at  each  transition.  For  a  trace  t,  extend  this  notation  to  ev(t),  denoting  the  se¬ 
quence  of  events  resulting  from  application  of  ev(-)  to  each  state  in  trace  t.5  Fur¬ 
ther  assume  that  each  user  of  a  system  is  cleared  either  at  confidentiality  level 
L,  representing  low  (public)  information,  or  II ,  representing  high  (secret)  infor¬ 
mation,  and  that  each  event  is  labeled  with  one  of  these  confidentiality  levels. 
Define  evL(t)  to  be  the  subsequence  of  low  input  and  output  events  contained 
within  ev(t),  and  evmn{t )  to  be  the  subsequence  of  high  input  events  contained 
within  ev(t). 

Noninterference,  as  defined  by  Goguen  and  Meseguer  [46],  requires  that  com¬ 
mands  issued  by  users  holding  high  clearances  be  removable  without  affecting 
observations  of  users  holding  low  clearances.  Treating  commands  as  inputs 
and  observations  as  outputs,  we  model  this  security  policy  as  a  hyperproperty 
requiring  a  system  to  contain,  for  any  trace  t,  a  corresponding  trace  t1  with  no 
high  inputs  yet  with  the  same  low  events  as  t: 

GMNI  =  {T  e  Prop  |  T  G  SM 

A  (Vf  G  T  :  (3 1'  E  T  :  evmnf )  =  e 

A  evL(t)  =  evL[t')))}.  (4.1.4) 

Conjunct  T  e  SM  expresses  the  requirement,  made  by  Goguen  and  Meseguer 's 
formalization,  that  systems  are  deterministic  state  machines  (§5.4  defines  SM 
formally).  GMNI  is  not  a  trace  property  because  trace  t  is  allowed  only  if  cor¬ 
responding  trace  t'  is  also  allowed. 

5Depending  on  the  nature  of  events  in  the  particular  system  that  is  being  modeled,  it  might 
be  appropriate  for  ev(t)  to  eliminate  stuttering  of  events. 
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Generalized  noninterference  [81]  extends  Goguen  and  Meseguer's  definition 
of  noninterference  to  handle  nondeterministic  systems,  which  are  the  systems 
modeled  by  Prop.  McLean  [86]  reformulates  generalized  noninterference  as  a 
policy  requiring  a  system  to  contain,  for  any  traces  tx  and  t2,  an  interleaved 
trace  t3  whose  high  inputs  are  the  same  as  t\  and  whose  low  events  are  the  same 
as  t2.  This  is  a  hyperproperty: 

GNI  =  {Te  Prop  |  (Vti,t2  eT  :  (3t3  eT  : 

evHin{h )  =  evHin(t i)  A  evL(t3 )  =  evL(t2)))}.  (4.1.5) 

GNI  is  not  a  trace  property  because  the  presence  of  any  two  traces  t\  and  t2  in 
a  system  necessitates  the  presence  of  a  third  trace  t3. 

Observational  determinism  [85,102]  requires  a  system  to  appear  deterministic 
to  a  low  user.  Zdancewic  and  Myers's  [130]  definition  of  observational  deter¬ 
minism  can  be  formulated  as  a  hyperproperty: 

OD  =  {T  e  Prop  I  (Vt,t'  G  T  :  t[0\  =L  t'[0]  ==►  t  t')}.  (4.1.6) 

State  equivalence  relation  s  —l  s'  holds  whenever  states  s  and  s'  are  indistin¬ 
guishable  to  a  low  user,  and  trace  equivalence  relation  t  t'  holds  whenever 
traces  t  and  t'  are  indistinguishable  to  a  low  user.  Zdancewic  and  Myers  define 
trace  equivalence  in  terms  of  state  equivalence,  requiring  the  sequence  of  states 
in  each  trace  to  be  equivalent  up  to  both  stuttering  and  prefix;  equivalence  up  to 
prefix  makes  their  definition  termination  insensitive — that  is,  systems  are  allowed 
to  leak  information  via  termination  channels.6  OD  is  not  a  trace  property  be¬ 
cause  whether  some  trace  is  allowed  in  a  system  depends  on  all  the  other  traces 
of  the  system. 

6Zdancewic  and  Myers  also  require  systems  to  be  race  free,  hence  they  weaken  trace  equiv¬ 
alence  to  hold  for  each  memory  location  in  a  state  in  isolation,  not  over  all  memory  locations 
simultaneously.  We  omit  this  requirement  for  simplicity. 
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Bisimulation-based  definitions  of  information-flow  security  policies  can  also 
be  formulated  as  hyperproperties/  which  we  demonstrate  in  chapter  5  with  Fo- 
cardi  and  Gorrieri's  [44]  bisimulation  nondeducibility  on  compositions  (BNDC) 
and  with  Boudol  and  Castellani's  [18]  definition  of  noninterference. 

All  information-flow  security  policies  we  investigated  turned  out  to  be  hy¬ 
perproperties,  not  trace  properties.  This  is  suggestive,  but  any  stronger  state¬ 
ment  about  the  connection  between  information  flow  and  hyperproperties 
would  require  a  formal  definition  of  information-flow  policies,  and  none  is  uni¬ 
versally  accepted.  Nonetheless,  we  believe  that  information  flow  is  intrinsically 
tied  to  correlations  between  (not  within)  executions.  And  hyperproperties  are 
sufficiently  expressive  to  formulate  such  correlations,  whereas  trace  properties 
are  not. 

Service  level  agreements.  A  service  level  agreement  (SLA)  specifies  acceptable 
performance  of  a  system.  Such  specifications  commonly  use  statistics  such  as 

•  mean  response  time,  the  mean  time  that  elapses  between  a  request  and  a 
response; 

•  time  service  factor,  the  percentage  of  requests  that  are  serviced  within  a 
specified  time;  and 

•  percentage  uptime,  the  percentage  of  time  during  which  the  system  is  avail¬ 
able  to  accept  and  service  requests. 

These  statistics  can  be  used  to  define  policies  with  respect  to  individual  exe¬ 
cutions  of  a  system  or  across  all  executions  of  a  system.  In  the  former  case,  the 

7Since  hyperproperties  are  trace-based,  this  might  at  first  seem  to  contradict  results,  such  as 
Focardi  and  Gorrieri's  [44],  stating  that  bisimulation-based  definitions  are  more  expressive  than 
trace-based  definitions.  However,  by  employing  a  richer  notion  of  state  [105,  §1.3]  in  traces  than 
Focardi  and  Gorrieri,  hyperproperties  are  able  to  express  bisimulations. 
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SLA  would  be  a  trace  property.  For  example,  the  policy  "The  mean  response 
time  in  each  execution  is  less  than  1  second"  might  not  be  satisfied  by  a  system 
if  there  are  executions  in  which  some  response  times  are  much  greater  than  1 
second.  Yet  if  these  executions  are  rare,  the  system  might  still  satisfy  the  policy 
"The  mean  response  time  over  all  executions  is  less  than  1  second."  This  latter 
SLA  is  not  a  trace  property,  but  it  is  a  hyperproperty: 

RT  =  {T  G  Prop  |  mean  I  respTimes(t)  J  <  1}.  (4.1.7) 

\teT  J 

Function  mean{ X)  denotes  the  mean8  of  a  set  X  of  real  numbers,  and 
resp  Times  (t)  denotes  the  set  of  response  times  (in  seconds)  from  request  and 
response  events  in  trace  t.  Policies  derived  from  the  other  SLA  statistics  above 
can  similarly  be  expressed  as  hyperproperties. 

4.1.4  Beyond  Hyperproperties? 

Flyperproperties  are  able  to  express  security  policies  that  trace  properties  can¬ 
not.  So  it  is  natural  to  ask  whether  there  are  security  policies  that  hyperproper¬ 
ties  cannot  express.  We  have  equated  security  policies  with  system  properties, 
and  we  chose  to  model  systems  as  trace  sets.  Every  property  of  trace  sets  is  a 
hyperproperty,  so  by  definition  hyperproperties  are  expressively  complete  for 
our  formulations  of  "system"  and  "security  policy."  To  find  security  policies 
that  hyperproperties  cannot  express  (if  any  exist),  we  would  need  to  examine 
alternative  notions  of  systems  and  security  policies.  Alternative  formulations 
of  systems  are  discussed  in  chapter  5,  but  all  the  formulations  considered  there 
turn  out  to  have  encodings  as  trace  sets — thus  hyperproperties  are  complete  for 

8Since  X  might  have  infinite  cardinality,  RT  requires  a  definition  of  the  mean  of  an  infinite 
set  (and,  for  some  sets,  this  mean  does  not  exist).  We  omit  formalizing  such  a  definition  here; 
one  possibility  is  to  use  Cesaro  means  [54], 
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those  formulations.  We  do  not  know  whether  other  formulations  exist  that  do 
not  have  such  encodings. 

One  way  to  generalize  the  notion  of  a  security  policy  is  to  consider  policies 
on  sets  of  systems — for  example,  diversity  [100],  which  requires  the  systems  all  to 
implement  the  same  functionality  but  to  differ  in  their  implementation  details. 
Any  such  policy,  however,  could  be  modeled  as  a  hyperproperty  on  a  single 
system  that  is  a  product9  of  all  the  systems  in  the  set.  So  hyperproperties  again 
seem  to  be  sufficient. 


4.1.5  Logic  and  Hyperproperties 

We  have  not  given  a  logic  in  which  hyperproperties  may  be  expressed.  The  ex¬ 
amples  in  this  chapter  require  only  second-order  logic.  Although  higher-order 
logic  might  also  be  useful  to  express  hyperproperties,  higher-order  logic  is  re¬ 
ducible  to  second-order  logic  [107,  §6.2],  So  we  believe  that  second-order  logic 
is  sufficient  to  express  all  hyperproperties.  But  we  do  not  know  whether  the  full 
power  of  second-order  logic  is  necessary  to  express  hyperproperties  of  interest. 
This  has  ramifications  for  verification  of  hyperproperties,  because  although  full 
second-order  logic  cannot  be  effectively  and  completely  axiomatized,  fragments 
of  it  can  be  [14,  §2.3]. 10 

9The  product  of  systems  Xi  and  T2  can  be  defined  as  system  |(q[0],  t2[0])(ti[l],  h[2]) .. .  |  t\  e 
Ti  A  <2  €E  T2}r  comprising  traces  over  pairs  of  states.  Generalizing,  the  product  of  a  set  of  n 
systems  comprises  traces  over  n-tuples  of  states. 

10It  is  natural  to  ask  whether  we  could  further  reduce  second-order  logic  to  first-order.  Such  a 
reduction  is  possible,  but  only  with  the  Henkin,  rather  than  standard,  semantics  of  second-order 
logic  [14,  §4.2],  We  do  not  know  which  of  these  semantics  should  be  preferred  for  hyperproper¬ 
ties.  However,  there  are  trace  properties,  and  thus  hyperproperties,  that  we  conjecture  cannot 
be  expressed  in  first-order  logic — for  example,  the  trace  property  containing  the  single  trace 
pqppqqpppqqq  ■  ■  .,  where  p  and  q  are  states.  This  suggests  that  the  standard  semantics  is  appro¬ 
priate. 
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4.1.6  Refinement  and  Hyperproperties 

Programmers  use  stepwise  refinement  [1,9,33,40,71,126]  to  develop,  in  a  series  of 
steps,  a  program  that  implements  a  specification.  The  programmer  starts  from 
the  specification.  Each  successive  step  creates  a  more  concrete  specification,  ul¬ 
timately  culminating  in  a  specification  sufficiently  concrete  that  a  computer  can 
execute  it.  To  prove  that  the  final  concrete  specification  correctly  implements  the 
original  specification,  the  programmer  argues  at  each  step  that  the  new  concrete 
specification  refines  the  previous  specification.  Specification  S i  refines  specifica¬ 
tion  S2r  denoted  ,S'i  REF  S2,  iff  every  behavior  permitted  by  ,S' ,  is  also  permitted 
by  S2 — that  is,  the  set  of  behaviors  of  Si  is  a  subset  of  the  set  of  behaviors  of  S2. 

Specifications  might  describe  behaviors  at  different  levels  of  abstraction.  For 
example,  a  specification  might  describe  behaviors  of  a  queue,  but  a  refinement 
of  that  specification  might  use  an  array  to  implement  this  behavior.  Or  a  speci¬ 
fication  might  describe  behaviors  using  critical  sections,  but  a  refinement  might 
implement  critical  sections  with  semaphores.  So  programmers  need  techniques 
to  relate  the  behaviors  described  by  specifications.  Abstraction  functions  [55,56] 
and  refinement  mappings  [1]  have  been  developed  for  this  purpose;  both  interpret 
concrete  behaviors  as  abstract  behaviors. 

Generalizing  from  these  two  techniques,  let  an  interpretation  function  be  a 
function  of  type  T  — >  T.  Let  IF  be  any  class  of  interpretation  functions  that  (like 
abstraction  functions  and  refinement  mappings)  is  closed  under  composition 
and  contains  the  identity  function  id,.11 

nAbstraction  functions  must  also  preserve  data  type  operations,  and  refinement  mappings 
must  preserve  externally  visible  components  up  to  stuttering.  But  these  restrictions  are  not 
relevant  to  our  discussion. 
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An  interpretation  function  a  can  be  lifted  to  Prop  — >  Prop  by  applying  a  to 
each  trace  in  a  set: 

a(T)  =  (a(t)  |  teT}. 

System  S  a-satisfies  trace  property  P,  denoted  S  (=Q  P,  iff  a  (S')  |=  P.  Notation 
S  |=  P,  as  we  have  used  it  so  far,  thus  means  that  S  |  =i(f  P. 

Trace  property  Pi  refines  P2  under  interpretation  a,  denoted  Pi  REF,,  P2, 
iff  a(Pi)  C  P2.  So  for  trace  properties,  satisfaction  is  the  same  relation  as  refine¬ 
ment,  and  subset  implies  refinement — that  is,  if  C  is  a  subset  of  A,  then  C  refines 
A  (under  interpretation  id).  This  implication  is  desirable,  because  it  permits  re¬ 
finements  that  resolve  nondeterminism  by  removing  traces  from  a  system. 

It  is  well-known  that  this  kind  of  refinement  does  not  generally  work  for  se¬ 
curity  policies.12  For  example,  recall  system  it  (chapter  1),  which  nondetermin- 
istically  chooses  to  output  0,  1,  or  the  value  of  a  secret  bit  h.  System  it  satisfies 
the  specification  "The  possible  output  values  are  independent  of  the  values  of 
secrets,"  which  can  be  formulated  as  a  hyperproperty.  But  consider  a  system  it' 
that  always  outputs  h.  System  ir'  does  not  satisfy  the  specification  and  therefore 
cannot  refine  i r,  yet  it'  C  it.  So  subset  does  not  imply  refinement  for  hyperprop¬ 
erties  as  it  does  for  trace  properties. 

Hyperproperty  H,  refines  H2  under  interpretation  a,  denoted  H,  HREF,, 
H2,  iff  a(Hj)  C  H2/  where  a(H )  is  defined  as  (a(T)  |  T  e  H}.  A  natural 
relationship  that  we  would  expect  to  hold  is 

(VS  E  Prop.  H  G  HP  :  S  \=  H  <=>  [5]  HREFid  H),  (4.1.8) 

12Previous  work  has  identified  refinement  techniques  that  are  valid  for  use  with  certain 
information-flow  security  policies  [17, 79,86]. 
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because  satisfaction  and  refinement  intuitively  should  agree  (as  they  did  for 
trace  properties).  Straightforward  application  of  definitions  shows  that  (4.1.8) 
holds  iff  H  is  subset  closed. 

Thus,  perhaps  unsurprisingly,  the  set  of  hyperproperties  with  which  refine¬ 
ment  works  is  the  set  SSC  of  subset-closed  hyperproperties: 

SSC  =  {H  e  HP  I  (VT  G  Prop  :  T  G  H 

==►  (VT'  G  Prop  :T'CT  ==►  T  e  H))}. 

The  lifted  trace  properties  are,  of  course,  members  of  SSC.  But  SSC  contains 
more  than  just  the  lifted  trace  properties.  For  example,  observational  determin¬ 
ism  OD  (4.1.6)  is  subset  closed  and  therefore  a  member  of  SSC,  but  OD  is  not  a 
lifted  trace  property. 


4.2  Hypersafety 

According  to  Alpern  and  Schneider  [5],  the  "bad  thing"  in  a  safety  property 
must  be  both 

•  finitely  observable,  meaning  its  occurrence  can  be  detected  in  finite  time, 
and 

•  irremediable,  so  its  occurrence  can  never  be  remediated  by  future  events. 

No-read-then-write  NRW  (4.1.1)  and  access  control  AC  (4.1.2)  are  both  safety. 
The  bad  thing  for  NRW  is  a  finite  trace  in  which  a  network  write  occurs  after  a 
file  read.  This  bad  thing  is  finitely  observable,  because  the  write  can  be  detected 
in  some  finite  prefix  of  the  trace,  and  irremediable,  because  the  network  write 
can  never  be  undone.  For  AC,  the  bad  thing  is  similarly  a  finite  trace  in  which 
an  operation  is  performed  without  appropriate  rights. 


122 


For  trace  properties,  a  bad  thing  is  a  finite  trace  that  cannot  be  a  prefix  of  any 
execution  satisfying  the  safety  property  A  finite  trace  t  is  a  prefix  of  a  (finite  or 
infinite)  trace  t',  denoted  t  <  t! ,  iff  t'  =  I  t"  for  some  t"  G  \Eh 

Definition  4.1.  A  trace  property  S'  is  a  safety  property  [5]  iff 

(Vf  G  \I/jnf  :  t  f  S  ==>  (3m  G  d'fjn  :  m  <  t  A 

(Vf'  G  fh inf  :  m  <t'  =>•  t'  f  S))). 

Define  SP  to  be  the  set  of  all  safety  properties;  note  that  SP  is  itself  a  hyperpro- 
perty. 

We  generalize  safety  to  hypersafety  by  generalizing  the  bad  thing  from  a 
finite  trace  to  a  finite13  set  of  finite  traces.  Define  Obs  to  be  the  set  of  such  obser¬ 
vations: 

Obs  4  hn), 

where  V^n(X)  denotes  the  set  of  all  finite  subsets  of  set  X.  Prefix  <  on  sets  of 
traces  is  defined  as  follows:14 

T<T'  =  (Vf  G  T  :  (3t'  G  T  :  t  <  t')). 

Note  that  this  definition  allows  T'  to  contain  traces  that  have  no  prefix  in  T. 
Definition  4.2.  A  hyperproperty  S  is  a  safety  hyperproperty  (is  ln/persafety)  iff 

(VT  G  Prop  :T  fS  =>  (3M  G  Obs  :  M  <  T 

A  (VT'  G  Prop  :  M  <T  =►  T  ^  S))). 

13Infinite  sets  might  seem  to  be  an  attractive  alternative,  and  many  of  the  results  in  the  rest  of 
this  chapter  would  still  hold.  However,  the  topological  characterization  given  in  §4.5  (specifi¬ 
cally,  propositions  4.4  and  4.5)  would  be  sacrificed. 

14Other  definitions  of  trace  set  prefix  are  possible,  but  inconsistent  with  our  notion  of  obser¬ 
vation.  We  discuss  this  in  §4.5. 
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The  definition  of  hypersafety  parallels  the  definition  of  safety,  but  the  domains 
involved  now  include  an  extra  level  of  sets.  Define  S HP  to  be  the  set  of  all  safety 
hyperproperties. 

Observational  determinism  OD  (4.1.6)  is  hypersafety.  The  bad  thing  is  a  pair 
of  traces  that  are  not  low-equivalent  despite  having  low-equivalent  initial  states. 
But  set  SP  of  all  safety  properties  is  not  hypersafety:  there  is  no  bad  thing  that 
prevents  an  arbitrary  trace  property  from  being  extended  to  a  safety  property. 

Safety  properties  lift  to  safety  hyperproperties: 

Proposition  4.1.  (VS1  G  Prop  :  S'  G  SP  •<=>■  [S]  G  SHP). 

Proof.  In  appendix  4.  A.  □ 

Refinement  of  hypersafety.  Stepwise  refinement  works  with  all  safety  hy¬ 
perproperties,  because  safety  hyperproperties  are  subset  closed  (c.f.  §4.1.6),  as 
stated  by  the  following  theorem. 

Theorem  4.1.  SHP  c  SSC. 

Proof.  In  appendix  4.  A.  □ 

A  consequence  of  theorem  4.1  is  that  any  hyperproperty  that  is  not  sub¬ 
set  closed  cannot  be  hypersafety.  For  example,  generalized  noninterference 
GNI  (4.1.5)  is  not  subset  closed:  a  system  containing  traces  A  and  f2  and  in¬ 
terleaved  trace  t3  might  satisfy  GNI ,  but  the  subset  containing  only  t 1  and  t2 
would  not  satisfy  GNI.  Thus  GNI  cannot  be  hypersafety. 
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4.3  Beyond  2-Safety 


Safety  properties  enjoy  a  relatively  complete  verification  methodology  based  on 
invariance  arguments  [6].  Although  we  have  not  obtained  such  a  methodology 
for  hypersafety,  we  can  use  invariance  arguments  to  verify  a  class  of  safety  hy¬ 
perproperties  by  generalizing  recent  work  on  verification  of  secure  information 
flow. 

Recall  that  secure  information  flow  is  a  hyperproperty  but  not  a  trace  prop¬ 
erty  Recent  work  gives  system  transformations  that  reduce  verifying  secure 
information  flow15  to  verifying  a  safety  property  of  some  transformed  system: 
Pother  and  Simonet  [99]  develop  a  type  system  for  verifying  secure  informa¬ 
tion  flow  based  on  simultaneous  reasoning  about  two  executions  of  a  program. 
Darvas  et  al.  [34]  show  that  secure  information  flow  can  be  expressed  in  dy¬ 
namic  logic.  Barthe  et  al.  [12]  give  an  equivalent  formulation  for  Hoare  logic 
and  temporal  logic,  based  on  a  self-composition  construction. 

Define  the  sequential  self-composition  of  P  as  the  program  P;  P',  where  P'  de¬ 
notes  program  P,  but  with  every  variable  renamed  to  a  fresh,  primed  variable — 
for  example,  variable  x  is  renamed  to  x'.  One  way  to  verify  that  P  exhibits  se¬ 
cure  information  flow  is  to  establish  the  following  trace  property  of  transformed 
program  P;  P': 

If  for  every  low  variable  l,  before  execution  /  =  V  holds,  then  when 
execution  terminates  /  =  l'  still  holds,  no  matter  what  the  values  of 
high  variables  were. 

15These  reductions  are  possible  because  the  particular  formulations  of  secure  information 
flow  used  in  each  work  are  actually  hypersafety.  A  formulation  that  is  hyper  liveness — which 
would  include  all  possibilistic  information-flow  policies,  as  discussed  in  §4.4 — would  not  be 
amenable  to  these  reductions. 
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Barthe  et  al.  generalize  the  self-composition  operator  from  sequential  composi¬ 
tion  to  any  operator  that  satisfies  certain  conditions,  and  they  note  that  parallel 
composition  satisfies  these  conditions.  They  also  relax  the  equality  constraints 
in  the  above  trace  property  to  partial  equivalence  relations.  Terauchi  and 
Aiken  [115]  further  generalize  the  applicability  of  self-composition  by  showing 
that  it  can  be  used  to  verify  any  2-safety  property,  which  they  define  informally 
as  a  "property  that  can  be  refuted  by  observing  two  finite  traces." 

Using  hyperproperties,  we  can  show  that  the  above  results  are  special  cases 
of  a  more  general  theorem.  Define  a  k- safety  hyperproperty  as  a  safety  hyper¬ 
property  in  which  the  bad  thing  never  involves  more  than  k  traces: 

Definition  4.3.  A  hyperproperty  S  is  a  k-safety  hyperproperty  (is  k-safety )  iff 

(VT  G  Prop  :TfS  =>  (3M  G  Obs  :  M  <  T  A  \M\  <  k 

A  (VT'  G  Prop  :  M  <T  =►  T'  f  S))). 

This  is  just  the  definition  of  hypersafety  with  an  added  conjunct  "\M\  <  k".  For 
a  given  k,  define  KSHP(A;)  to  be  the  set  of  all  Usafety  hyperproperties. 

As  an  example  of  a  k- safety  hyperproperty  for  any  k,  consider  a  system  that 
stores  a  secret  by  splitting  it  into  k  shares.  Suppose  that  an  action  of  the  system 
is  to  output  a  share.  Then  a  hyperproperty  of  interest  might  be  that  the  system 
cannot,  across  all  of  its  executions,  output  all  k  shares  (thereby  outputting  suf¬ 
ficient  information  for  the  secret  to  be  reconstructed).  We  denote  this  /c-safety 
hyperproperty  as  SecSk- 

The  1-safety  hyperproperties  are  the  lifted  safety  properties — that  is, 

KSHP(l)  =  {[S]  |  S  G  SP} 

— since  the  bad  thing  for  a  safety  property  is  a  single  trace.  Thus  "1-safety"  and 
"safety"  are  synonymous. 
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The  Terauchi  and  Aiken  definition  of  2-safety  properties  is  limited  to  deter¬ 
ministic  programs  that  are  expressed  in  a  relational  model  of  execution  (which 
we  address  further  in  §5.2),  and  it  ignores  nonterminating  traces.  So  their  2- 
safety  properties  are  a  strict  subset  of  the  2-safety  hyperproperties,  KSHP(2). 
For  example,  observational  determinism  OD  (4.1.6)  is  not  a  2-safety  property, 
but  it  is  a  2-safety  hyperproperty. 

Define  the  parallel  self-composition  of  system  S  as  the  product  system  S  x  S' 
consisting  of  traces  over  S  xE: 

S  x  S'  =  {(f[0],  t/[0])(t[l], f'[l])  ■  •  •  1 1  g  S'  A  t'  g  S'}. 

Define  the  k-product  of  S',  denoted  Sk,  to  be  the  fc-folci  parallel  self-composition 
of  S',  comprising  traces  over  Self-composition  S'  x  S'  is  equivalent  to  2- 
product  S2. 

Previous  work  has  shown  how  to  reduce  a  particular  formulation  of  nonin¬ 
terference  of  system  S  to  a  related  safety  property  of  S2  [12],  and  how  to  reduce 
any  2-safety  hyperproperty  of  S  to  a  related  safety  property  of  S';  S'  [11 5],  The 
following  theorem  generalizes  those  results.  Let  Sys  be  the  set  of  all  systems. 
For  any  system  S',  any  k- safety  hyperproperty  K  of  S  can  be  reduced  to  a  safety 
property  K  of  Sk ,  and  the  proof  of  the  following  theorem  (in  appendix  4.A) 
shows  how  to  construct  K  from  K : 

Theorem  4.2.  (VS'  G  Sys,  Kg  KSHP(A;)  :  (3K  G  SP  :  S  [=  K  <*=►  Sk  |=  K)). 

Proof  In  appendix  4.  A.  □ 

Theorem  4.2  provides  a  verification  technique  for  fc -safety:  reduce  a  S-safety 
hyperproperty  to  a  safety  property,  then  verify  that  the  safety  property  is  satis¬ 
fied  by  Sk  using  an  invariance  argument.  Since  invariance  arguments  are  rela- 
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tively  complete  for  safety  properties  [6],  this  methodology  is  relatively  complete 
for  A;-safety. 

However,  theorem  4.2  does  not  provide  the  relatively  complete  verification 
procedure  we  seek  for  hypersafety,  because  there  are  safety  hyperproperties  that 
are  not  k- safety  for  any  k.  For  example,  consider  the  hyperproperty  "for  any  k, 
a  system  cannot  output  all  k  shares  of  a  secret  from  a  k- secret  sharing": 

SecS  =  |J  SecSk.  (4.3.1) 

k 

SecS  is  not  k- safety  for  any  k.  Yet  it  is  hypersafety,  since  any  trace  property  not 
contained  in  it  violates  some  SecSk- 

4.4  Hyperliveness 

Alpern  and  Schneider  [5]  characterize  the  "good  thing"  in  a  liveness  property  as 

•  always  possible,  no  matter  what  has  occurred  so  far,  and 

•  possibly  infinite,  so  it  need  not  be  a  discrete  event. 

For  example,  guaranteed  service  GS  (4.1.3)  is  a  liveness  property  in  which  the 
good  thing  is  the  eventual  response  to  a  request.  This  good  thing  is  always  pos¬ 
sible,  because  a  state  in  which  a  response  is  produced  can  always  be  appended 
to  any  finite  trace  containing  a  request.  And  this  good  thing  is  not  infinite  be¬ 
cause  the  response  is  a  discrete  event,  but  starvation  freedom,  which  stipulates 
that  a  system  makes  progress  infinitely  often,  is  an  example  of  a  liveness  prop¬ 
erty  with  an  infinite  good  thing. 


128 


Formally,  a  good  thing  is  an  infinite  suffix  of  a  finite  trace: 

Definition  4.4.  Trace  property  L  is  a  liveness  property  [5]  iff 
(Vt  G  Tfin  :  (3 1'  E  T inf  :  t  <  t'  A  t'  G  L)). 

Define  LP  to  be  the  set  of  all  liveness  properties.  Not  surprisingly,  LP  is  a  hyper- 
property. 

Just  as  with  hypersafety,  we  generalize  liveness  to  hyperliveness  by  general¬ 
izing  a  finite  trace  to  a  finite  set  of  finite  traces.  The  definition  of  hyperliveness 
is  essentially  the  same  as  the  definition  of  liveness,  except  for  an  additional  level 
of  sets: 

Definition  4.5.  Hyperproperty  L  is  a  liveness  hyperproperty  (is  hyperliveness )  iff 
(VT  G  Obs  :  (3 T'  G  Prop  :  T  <  V  A  T  G  L)). 

Define  LHP  to  be  the  set  of  all  liveness  hyperproperties. 

Mean  response  time  RT  (4.1.7)  is  not  liveness  but  it  is  hyperliveness:  the 
good  thing  is  that  the  mean  response  time  is  low  enough.  Given  any  observa¬ 
tion  T  with  any  mean  response  time,  it  is  always  possible  to  extend  T,  such  that 
the  resulting  system  has  a  low  enough  mean  response  time,  by  adding  a  trace 
that  has  many  quick  responses.  Note  that  if  this  policy  were  approximated  by 
limiting  the  maximum  response  time  in  each  execution,  the  resulting  hyperpro¬ 
perty  would  be  a  lifted  safety  property. 

Set  LP  of  all  liveness  properties  is  a  liveness  hyperproperty:  every  obser¬ 
vation  can  be  extended  to  any  liveness  property.  Similarly,  set  S  P  of  all  safety 
properties  is  a  liveness  hyperproperty:  every  observation  can  be  extended  to  a 
safety  property  (whose  bad  thing  is  "not  beginning  execution  with  one  of  the 
finite  traces  in  the  observation"). 
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The  only  hyperproperty  that  is  both  hypersafety  and  hyperliveness  is  true, 
defined  as  Prop.  The  hyperproperty  false,  defined  as  {0},  is  hypersafety  but  not 
hyperliveness.16 

Liveness  properties  lift  to  liveness  hyperproperties: 

Proposition  4.2.  (VL  6  Prop  :  L  e  LP  [L\  e  LHP). 

Proof.  In  appendix  4.  A.  □ 

Possibilistic  information  flow.  Some  information-flow  security  policies,  such 
as  observational  determinism  OD  (4.1.6),  restrict  nondeterminism  of  a  system 
from  being  publicly  observable.  However,  observable  nondeterminism  might 
be  useful,  for  a  couple  of  reasons.  First,  systems  might  exhibit  nondeterminism 
due  to  scheduling.  If  the  scheduler  cannot  be  influenced  by  secret  information 
(i.e.,  the  scheduler  does  not  serve  as  a  covert  timing  channel),  it  is  reasonable 
to  allow  the  scheduler  to  behave  nondeterministically.  Second,  nondetermin¬ 
ism  is  a  useful  modeling  abstraction  when  dealing  with  probabilistic  systems 
(which  we  consider  in  more  detail  in  §5.5).  When  the  exact  probabilities  for  a 
system  are  unknown,  they  can  be  abstracted  by  nondeterminism.  For  at  least 
these  reasons,  there  is  a  history  of  research  on  possibilistic  information-flow  se¬ 
curity  policies,  beginning  with  nondeducibility  [113]  and  generalized  noninter¬ 
ference  [81].  Such  policies  are  founded  on  the  intuition  that  low  observers  of 
a  system  should  gain  little  from  their  observations.  Typically,  these  policies  re¬ 
quire  that  every  low  observation  is  consistent  with  some  large  set  of  possible 
high  behaviors. 

16The  false  property  is  the  empty  set  of  traces,  so  it  might  seem  reasonable  to  define  false  as 
the  empty  set  of  trace  sets.  But  then  the  lift  of  the  false  property  would  not  equal  false.  Note 
that  false  is  not  satisfied  by  any  system  because,  by  definition,  0  is  not  a  system. 
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McLean  [86]  shows  that  possibilistic  information-flow  policies  can  be  ex¬ 
pressed  as  trace  sets  that  are  closed  with  respect  to  selective  interleaving  func¬ 
tions.  Such  functions,  given  two  executions  of  a  system,  specify  another  trace 
that  must  also  be  an  execution  of  the  system — as  did  the  definition  of  general¬ 
ized  noninterference  GNI  (4.1.5).  Mantel  [78]  generalizes  from  these  functions 
to  closure  operators,  which  extend  a  set  S  of  executions  to  a  set  S'  such  that  S  C  S'. 
Mantel  argues  that  every  possibilistic  information-flow  policy  can  be  expressed 
as  a  closure  operator. 

Given  a  closure  operator  Cl  that  expresses  a  possibilistic  information-flow 
policy,  the  hyperproperty  PCi  induced  by  Cl  is 

Pci  =  (C7(T)  |  T  e  Prop}. 

Define  the  set  PI F  of  all  such  hyperproperties  to  be  {Ja  Pci ■  It  is  now  easy  to  see 
that  these  are  liveness  hyperproperties:  any  observation  T  can  be  extended  to 
its  closure. 

Theorem  4.3.  PIF  C  LHP. 

Proof.  In  appendix  4.  A.  □ 

Possibilistic  information-flow  policies  are  therefore  never  hypersafety.17 

Temporal  logics.  Consider  the  hyperproperty  "For  every  initial  state,  there  is 
some  terminating  trace,  but  not  all  traces  must  terminate,"  denoted  as  NNT.  In 
branching-time  temporal  logic,  NNT  could  be  expressed  as 

0  terminates ,  (4.4.1) 

17Another  way  to  reach  this  conclusion  is  to  observe  that  closure  operators  need  not  yield  hy¬ 
perproperties  that  are  subset  closed — yet,  by  theorem  4.1,  every  safety  hyperproperty  is  subset 
closed. 
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where  terminates  is  a  state  predicate  and  0  is  the  "not  never"  operator.18  There 
is  no  linear-time  temporal  predicate  that  expresses  NNT,  nor  is  there  a  live¬ 
ness  property  equivalent  to  NNT  [69];  an  approximation  would  be  a  linear-time 
predicate,  or  a  liveness  property,  that  requires  every  trace  to  terminate.  How¬ 
ever,  NNT  is  hyperliveness  because  any  finite  trace  can  be  extended  to  a  set  of 
executions  such  that  at  least  one  execution  terminates. 

This  example  suggests  a  relationship  between  hyperproperties  and 
branching-time  temporal  predicates,  and  between  trace  properties  and  linear¬ 
time  temporal  predicates.  We  can  make  this  relationship  precise  by  examin¬ 
ing  the  semantics  of  temporal  logic.  In  both  branching  time  and  linear  time, 
a  semantic  model  contains  a  set  of  states  and  a  valuation  function  assign¬ 
ing  a  Boolean  value  to  each  atomic  proposition  in  each  state.  Additionally,  a 
branching-time  model  requires  a  current  state  and  a  set  of  traces,  whereas  a 
linear-time  model  requires  a  single  trace  [41].  These  requirements  differ  because 
a  linear-time  predicate  is  a  property  of  a  trace,  whereas  a  branching-time  predi¬ 
cate  is  a  property  of  a  state  and  all  the  future  traces  that  could  proceed  from  that 
state.  Thus,  trace  properties  model  linear-time  predicates,  and  hyperproperties 
model  branching-time  predicates  for  a  given  state. 

Moreover,  hyperproperties  can  express  policies  that  branching-time  predi¬ 
cates  cannot.  Consider  the  trace  property  "Every  trace  must  end  with  an  infi¬ 
nite  number  of  good  states,"  denoted  SAG,  where  good  is  a  state  predicate.  In 
linear- time  temporal  logic,  SAG  could  be  expressed  as 

□  good ,  (4.4.2) 

18Temporal  logics  CTL  [27]  would  express  this  formula  as  E  F  terminates. 
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where  is  the  "sometime"  operator  and  □  is  the  "always"  operator.  SAG  is 
liveness  and  thus  hyperliveness,  but  there  is  no  branching-time  predicate  that 
expresses  it  [69]. 

4.5  Topology 

Topology  enables  an  elegant  characterization  of  the  structure  of  hyperproper¬ 
ties,  just  as  it  did  for  trace  properties.  We  begin  by  summarizing  the  topology 
of  trace  properties  [110]. 

Consider  an  observer  of  an  execution  of  a  system,  who  is  permitted  to  see 
each  new  state  as  it  is  produced  by  the  system;  otherwise,  the  system  is  a  black 
box  to  the  observer.  The  observer  attempts  to  determine  whether  trace  property 
P  holds  of  the  system.  At  any  point  in  time,  the  observer  has  seen  only  a  finite 
prefix  of  the  (infinite)  execution.  Thus,  the  observer  should  declare  that  the 
system  satisfies  P,  after  observing  finite  trace  t,  only  if  all  possible  extensions  of 
t  will  also  satisfy  P.  Abramsky  names  such  properties  observable  [3]. 

Like  the  bad  thing  for  a  safety  property,  a  observable  property  must  be  de¬ 
tectable  in  finite  time;  and  once  detected,  hold  thereafter.  Formally,  O  is  a  ob¬ 
servable  property  iff 

(Vf  G  T inf  :  t  G  O  ==>  (3m  G  Tfin  :  m  <t 

A  (Vf'  G  Tinf  :  m  <  t'  ==>■  t'  G  O))). 

Define  O  to  be  the  set  of  observable  properties.  This  set  satisfies  two  closure 
conditions.  First,  if  Oi, . . . ,  On  are  observable,  then  f]”=i  is  also  observable. 
Second,  if  O  is  a  (potentially  infinite)  set  of  observable  properties,  then  IJoeo  ® 
is  also  observable.  Thus  O  is  closed  under  finite  intersections  and  infinite  unions. 
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A  topology  on  a  set  S  is  a  set  T  C  V(S)  such  that  T  is  closed  under  finite  inter¬ 
sections  and  infinite  unions.  Because  O  is  so  closed,  it  is  a  topology  on  \hinf.  We 
name  O  the  Plotkin  topology,  because  Plotkin  proposed  its  use  in  characterizing 
safety  and  liveness  [5]. 19 

The  elements  of  a  topology  T  are  called  its  open  sets.  A  convenient  way  to 
characterize  the  open  sets  of  a  topology  is  in  terms  of  a  base  or  a  subbase.  A  base 
of  topology  T  is  a  set  B  C  T  such  that  every  open  set  is  a  (potentially  infinite) 
union  of  elements  of  B.  A  subbase  is  a  set  ACT  such  that  the  collection  of  finite 
intersections  of  A  is  a  base  for  T.  The  set 

Ob  4  {If  |  f  e  Tfin} 

is  a  base  (and  a  subbase)  of  the  Plotkin  topology,  where 

tt  =  {?  e  Tinf  1 1  <  t'} 

is  the  completion  of  a  finite  trace  t.  When  t  <  t!  we  say  that  t!  extends  t.  The 
completion  of  t  is  thus  the  set  of  all  infinite  extensions  of  t. 

Alpern  and  Schneider  [5]  noted  that,  in  the  Plotkin  topology,  safety  proper¬ 
ties  correspond  to  closed  sets  and  liveness  properties  correspond  to  dense  sets. 
A  closed  set  is  the  complement  (with  respect  to  S)  of  an  open  set.  If  a  trace  t  is 
not  a  member  of  a  closed  set  C,  there  is  some  bad  thing  (specifically,  the  prefix 
m  of  t  in  the  definition  of  observable  as  instantiated  on  open  set  C,  the  comple¬ 
ment  of  C)  that  is  to  blame;  the  existence  of  such  bad  things  makes  C  a  safety 
property.  Likewise,  a  set  that  is  dense  intersects  every  non-empty  open  set  in  T. 
So  for  any  finite  trace  t  and  dense  set  D,  the  intersection  of  j 1  (which  is  open 
because  it  is  a  member  of  Ob)  and  D  is  nonempty.  Since  any  finite  trace  can  be 
extended  to  be  in  D,  it  holds  that  D  is  a  liveness  property. 

19Topology  O  is  also  the  Scott  topology  on  the  w-algebraic  CPO  of  traces  ordered  by  <  [110]. 
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We  want  to  construct  a  topology  on  sets  of  traces  that  extends  this  correspon¬ 
dence  to  hyperproperties.  The  most  important  step  is  generalizing  the  notion  of 
finite  observability  from  trace  properties  to  hyperproperties.  In  fact,  this  gener¬ 
alization  was  already  accomplished  in  §4.2,  where  a  bad  thing  was  generalized 
from  a  finite  trace  to  a  finite  set  of  finite  traces — that  is,  an  observation.  The 
observer,  as  before,  sees  the  system  produce  each  new  state  in  the  execution. 
However,  the  observer  may  now  reset  the  system  at  any  time,  causing  it  to  be¬ 
gin  a  new  execution.  At  any  finite  point  in  time,  the  observer  has  now  collected 
a  finite  set  of  finite  (thus  partial)  executions.  An  observation  is  thus  an  element 
of  Obs,  as  defined  in  §4.2. 

An  extension  of  an  observation  should  allow  the  observer  to  perform  addi¬ 
tional  resets  of  the  system,  yielding  a  larger  set  of  traces.  An  extension  should 
also  allow  each  execution  to  proceed  longer,  yielding  longer  traces.  So  extension 
corresponds  to  trace  set  prefix  <  (c.f.  §4.2).  The  completion  of  observation  M  is 

t  M  =  {T  G  Prop  |  M  <  T}. 

We  can  now  define  our  topology  on  sets  of  traces  in  terms  of  its  subbase: 

Osb  4  [\M  |  M  e  Obs}. 

The  base  Ob  of  our  topology  is  then  Osb  closed  under  finite  intersections.  The 
base  and  subbase  turn  out  to  be  the  same  sets: 

Proposition  4.3.  Ob  =  Osb . 

Proof.  In  appendix  4.  A.  □ 

Finally,  our  topology  O  is  Ob  closed  under  infinite  unions. 

Define  C  to  be  the  closed  sets  in  our  topology  and  T>  to  be  the  dense  sets. 
Just  as  safety  and  liveness  correspond  to  closed  and  dense  sets  in  the  Plotkin 
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topology,  hypersafety  and  hyperliveness  correspond  to  closed  and  dense  sets  in 
our  generalization  of  that  topology: 

Proposition  4.4.  SHP  =  C. 

Proof.  In  appendix  4.  A.  □ 

Proposition  4.5.  LHP  =  T>. 

Proof.  In  appendix  4.  A.  □ 

Our  topology  O  is  actually  equivalent  to  well-known  topology  The  Vietoris 
(or  finite  or  convex  Vietoris )  topology  is  a  standard  construction  of  a  topology  on 
sets  out  of  an  underlying  topology  [87, 116].  Our  underlying  topology  was  on 
traces,  and  we  constructed  a  topology  on  sets  of  traces.  The  Vietoris  construction 
can  be  decomposed  into  the  lower  Vietoris  and  upper  Vietoris  constructions  [109], 
which  also  yield  topologies.  Let  TJ l(T)  denote  the  lower  Vietoris  construction, 
which  given  underlying  topology  T  on  space  A  produces  the  topology  onT(T) 
induced  by  subbase  2l£B(T): 

«f(T)  4  {{0}|0eT}, 

where  (T)  is  defined20  as 

(T)  =  {u  eV{x)  |  unTftD}. 

20Operators  [■]  (from  §4.1)  and  (•)  are  similar  to  modal  logic  operators  □  (necessity)  and  (> 
(possibility):  For  trace  property  T,  lift  [T]  denotes  the  set  of  all  refinements  of  T — that  is,  the 
hyperproperty  in  which  T  is  necessary  Similarly,  ( T )  denotes  the  set  of  all  trace  properties  that 
share  a  trace  with  T — that  is,  the  hyperproperty  in  which  T  is  always  possible. 
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The  following  theorem  states  that  our  topology  is  equivalent  to  the  lower 
Vietoris  construction  applied  to  the  Plotkin  topology: 

Theorem  4.4.  O  =  V0L{O). 

Proof.  In  appendix  4.  A.  □ 

Smyth  [109]  established  that  the  lower  Vietoris  topology  is  equivalent  to  the 
lower  (or  Hoare )  powerdomain,  which  is  a  construction  used  to  model  the  seman¬ 
tics  of  nondeterminism  [98].  So  our  topology  embodies  the  same  intuition  about 
nondeterminism  as  the  lower  powerdomain  does. 

The  proof  of  theorem  4.4  yields  another  topological  characterization  of  safety 
hyperproperties:  the  set  of  lifted  safety  properties,  closed  under  infinite  inter¬ 
sections  and  finite  unions  (denoted  as  closure  operator  Clc,  because  these  clo¬ 
sure  conditions  characterize  a  topology  of  Closed  sets),  is  the  set  of  safety  hyper¬ 
properties. 

Proposition  4.6.  SHP  =  Clc({[S]  \  S  e  SP}). 

Proof  In  appendix  4.  A.  □ 

Defining  trace  set  prefix.  Recall  that  trace  set  prefix  <  is  defined  as  follows: 

T  <T  =  (Vf  e  T  :  (3  if  e  T  :  t  <  if)). 

For  clarity,  we  use  <l  instead  of  <  to  refer  to  that  definition  throughout  the  rest 
of  this  section  ( L  stands  for  Lower  Vietoris). 

Two  natural  alternatives  to  <l  are 

T  <u  T'  =  (Vf'  G  T  :  (3t  e  T  :  t  <  t% 

T  <c  T'  =  T  <lT'  AT  <u  T. 
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(U  and  C  stand  for  Upper  and  Convex  Vietoris.  These  prefix  relations  corre¬ 
spond  to  the  eponymous  topologies.)  However,  both  alternatives  turn  out  to  be 
unsuitable  for  our  purposes,  because  they  do  not  correspond  to  our  intuition 
about  finite  observability — as  we  now  explain. 

Hyperproperty  O  is  observable  iff 

(VTe  Prop  :  T  G  O  =>  (3 M  G  Obs  :  M  <  T 

A  (Vf  G  Prop  :  M  <T  =►  T'  G  O))). 

Consider  using  <v  for  trace  set  prefix  <.  For  a  concrete  example,  suppose  that 
S  =  {a,  b,  c},  O  is  observable,  T  G  O,  and  M  =  {a,  b}.  Any  T'  such  that  M<VT' 
must  be  a  member  of  O .  Every  trace  t'  in  V  must  begin  with  either  a  or  b  and 
cannot  begin  with  c.  In  particular,  V  might  contain  traces  beginning  only  with 
b,  never  with  a.  Observation  M  therefore  characterizes  a  system  in  which  a  non- 
deterministic  choice  to  produce  c  as  the  first  state  is  not  possible.  So  with  <Ur  an 
observation  records  what  nondeterminism  is  denied,  and  all  future  extensions 
of  that  observation  are  also  required  to  deny  that  nondeterminism. 

In  contrast,  with  <L  (i.e.,  our  topology),  an  observation  records  what  non¬ 
determinism  has  so  far  been  permitted,  and  all  future  extensions  of  that  obser¬ 
vation  are  required  also  to  permit  that  nondeterminism.  Our  intuition  is  that 
observers  of  a  black-box  system  can  observe  permitted  nondeterminism  (by  ob¬ 
serving  states  produced  by  the  system)  but  not  denied  nondeterminism.  The 
definition  of  <u  does  not  correspond  to  that  intuition,  but  the  definition  of  <l 
does.  Similarly,  using  <c  for  trace  set  prefix  leads  to  observations  that  record 
both  permitted  and  denied  nondeterminism  (because  <c  is  the  conjunction  of 
<L  and  <u),  and  therefore  <c  does  not  correspond  to  our  intuition,  either. 
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So  neither  the  upper  nor  the  convex  Vietoris  topology  enjoys  open  sets  that 
are  the  observable  hyperproperties;  consequently,  the  equivalence  of  closed  sets 
and  hypersafety  is  lost.  Nonetheless,  these  topologies  might  be  useful  for  other 
purposes — for  example,  in  refusal  semantics  for  CSP  [57]. 

4.6  Beyond  Hypersafety  and  Hyperliveness 

Security  policies  can  exhibit  features  of  both  safety  and  liveness.  For  example, 
consider  a  policy  on  a  medical  information  system  that  must  maintain  the  confi¬ 
dentiality  of  patient  records  and  must  also  eventually  notify  patients  whenever 
their  records  are  accessed  [8].  If  the  confidentiality  requirement  is  interpreted 
as  observational  determinism  OD  (4.1.6),  this  system  must  both  prevent  bad 
things  (OD,  which  is  hypersafety)  as  well  as  guarantee  good  things  (eventual 
notification,  which  can  be  formulated  as  liveness).  As  another  example,  con¬ 
sider  an  asynchronous  proactive  secret-sharing  system  [132]  that  must  main¬ 
tain  and  periodically  refresh  a  secret.  Each  share  refresh  must  complete  dur¬ 
ing  a  given  time  interval  with  high  probability.  Maintaining  the  confidential¬ 
ity  of  the  secret  can  be  formulated  as  SecS  (4.3.1),  which  is  hypersafety.  The 
eventual  refresh  of  the  secret  shares  can  be  formulated  as  liveness:  every  execu¬ 
tion  eventually  completes  the  refresh  if  enough  servers  remain  uncompromised. 
And  the  high  probability  that  the  refresh  succeeds  within  a  given  time  inter¬ 
val  is  hyperliveness — similar  to  mean  response  time  RT  (4.1.7).  Both  of  these 
examples  illustrate  hyperproperties  that  are  intersections  of  (hyper)safety  and 
(hyper)liveness. 

In  fact,  as  stated  by  the  following  theorem,  every  hyperproperty  is  the  inter¬ 
section  of  a  safety  hyperproperty  with  a  liveness  hyperproperty.  This  theorem 
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HP 


Figure  4.1:  Classification  of  security  policies 

generalizes  the  result  of  Alpern  and  Schneider  [5]  that  every  trace  property  is 
the  intersection  of  a  safety  property  and  a  liveness  property: 

Theorem  4.5.  (VPG  HP  :  (3  S  G  SHP.  L  G  LHP  P=  SDL)). 

Proof.  In  appendix  4.  A.  □ 


4.7  Summary 

This  chapter  has  classified  several  security  policies  with  hypersafety  and  hyper¬ 
liveness.  Figure  4.1  summarizes  this  classification. 

We  have  introduced  hyperproperties,  which  are  sets  of  trace  properties  and 
can  express  security  policies  that  trace  properties  cannot,  such  as  secure  infor¬ 
mation  flow  and  service  level  agreements.  We  have  generalized  safety  and  live¬ 
ness  to  hyperproperties,  showing  that  every  hyperproperty  is  the  intersection  of 
a  safety  hyperproperty  and  a  liveness  hyperproperty.  We  have  also  generalized 
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the  topological  characterization  of  safety  and  liveness  from  trace  properties  to 
hyperproperties.  We  have  shown  that  refinement  is  applicable  with  safety  hy¬ 
perproperties. 

We  have  given  a  relatively  complete  verification  methodology  for  A; -safety 
hyperproperties  that  generalizes  prior  techniques  for  verifying  secure  informa¬ 
tion  flow.  But  we  do  not  know  whether  there  is  a  relatively  complete  method¬ 
ology  for  all  hyperproperties,  or  even  all  safety  hyperproperties.21  If  such  a 
methodology  could  be  found,  security  might  take  its  place  as  "just  another" 
functional  requirement  to  be  verified. 


21  If  the  full  power  of  second-order  logic  is  necessary  to  express  hyperproperties  (as  discussed 
at  the  end  of  §4.1),  such  methods  could  not  exist.  Nonetheless,  methods  for  verifying  fragments 
of  the  logic  might  suffice  for  verifying  hyperproperties  that  correspond  to  security  policies. 
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4.A  Appendix:  Proofs 


Bueno  and  Clarkson  [20]  have  formally  verified  propositions  4.1  and  4.2,  theo¬ 
rems  4.2,  4.3,  and  4.5,  and  an  analogue  of  theorem  4.1  using  the  Isabelle/HOL 
proof  assistant  [95].  We  believe  that  the  remaining  proofs  could  also  be  formally 
verified. 

Proposition  4.1.  (VS  G  Prop  :  S  G  SP  [S]  G  SHP). 

Proof.  By  mutual  implication. 

(=>)  Let  S  be  an  arbitrary  safety  property.  We  want  to  show  that  [S]  is  a  safety 
hyperproperty — that  is,  any  trace  property  T  not  in  [S]  contains  some  bad 
thing. 

First,  we  find  a  bad  thing  M  for  T.  By  the  definition  of  lifting,  [S]  =  V(S)  = 
{P  G  Prop  |  P  C  S}.  Since  T  is  not  in  this  set,  T  %  S.  So  some  trace  t  is 
in  T  but  not  in  S.  By  the  definition  of  safety,  if  f  f  S,  there  is  some  finite 
trace  m  that  is  a  bad  thing  for  S.  So  no  extension  of  m  is  in  S.  Define  M  to 
be  {m}. 

Second,  we  show  that  M  is  irremediable.  Note  that  M  <  T  because  m  <  t 
and  t  G  T.  Let  V  be  an  arbitrary  trace  property  that  extends  M — that  is, 
M  <  V .  By  the  definition  of  <,  there  exists  a  t1  G  V  such  that  m  <  t1.  We 
established  above  that  no  extension  of  m  is  in  S,  so  t'  f  S.  But,  again  by 
the  definition  of  lifting,  T'  f  [S'],  since  T'  contains  a  trace  not  in  S. 

Thus,  by  definition,  [S]  is  hypersafety. 

(<=)  Let  S  be  an  arbitrary  trace  property  such  that  [S]  is  hypersafety.  We  want 
to  show  that  S  is  safety.  Our  strategy  is  as  above — we  find  a  bad  thing  and 
then  show  that  it  is  irremediable. 
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Consider  any  t  such  that  t  f  S.  By  the  definition  of  lifting,  we  have  that 
{t}  f  [S].  By  the  definition  of  hypersafety  applied  to  [S],  there  exists  an 
M  <  {t }  such  that  for  all  V  >  M,  we  have  V  f  [5]. 

We  claim  that  M  must  be  non-empty  To  show  this,  suppose  for  sake  of 
contradiction  that  M  is  empty  Then  M  is  a  prefix  of  every  trace  property 
T' ,  so  no  T'  can  be  a  member  of  S,  which  implies  that  [S']  itself  must  be 
empty  But  [S']  =  V(S),  so  [S]  must  at  least  contain  S'  as  a  member.  This  is 
a  contradiction,  thus  M  is  non-empty  and  contains  at  least  one  trace. 

All  traces  in  M  must  be  prefixes  of  t,  by  the  definition  of  <.  Choose  the 
longest  such  prefix  in  M  and  denote  it  as  m*.  This  m*  serves  as  a  bad  thing 
for  t,  as  we  show  next. 

Let  t!  be  arbitrary  such  that  m*  <  t' ,  and  let  V  =  {£'}.  By  the  transitivity  of 
<,  we  have  M  <  T' ,  so  T'  ^  [S]  by  the  above  application  of  the  definition 
of  hypersafety.  But  this  implies  that  t'  ^  S,  by  the  definition  of  lifting. 

We  have  shown  that,  for  any  t  S,  there  exists  an  rn  <  t,  such  that  for  any 
t'  >  m,  we  have  t'  ^  S.  Therefore,  S  is  safety,  by  definition.  □ 

Theorem  4.1.  SHP  c  SSC. 

Proof.  Assume  that  S  is  hypersafety.  For  sake  of  contradiction,  also  assume  that 
S  is  not  subset  closed.  This  latter  assumption  implies  that  there  exist  two  trace 
properties  T  and  V  such  that  T  e  S,  and  V  (j  S,  yet  T'  C  T.  By  the  definition 
of  hypersafety,  since  T'  f  S,  there  exists  an  observation  M  that  is  a  bad  thing 
for  T' — that  is,  M  <  T'  and  for  all  T"  such  that  M  <  T" ,  it  holds  that  T"  f  S. 
Consider  this  M.  By  the  definition  of  <,  since  V  C  T  and  M  <  T' ,  we  have 
M  <  T.  Then  T  is  an  instance  of  T"  above,  which  means  T  f  S.  But  this 
contradicts  T  e  S.  Therefore,  S  must  be  subset  closed. 
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To  see  that  the  subset  relation  is  strict,  define  the  trace  property  true  as  Tinf. 
Consider  any  liveness  property  L  other  than  true — for  example,  guaranteed  ser¬ 
vice  GS  (4.1.3).  When  lifted  to  hyperproperty  [L\,  the  result  is  subset  closed  by 
definition  of  [•].  By  proposition  4.2  below  (whose  proof  does  not  depend  on  this 
theorem),  [L]  is  hyperliveness.  Since  L  is  not  true,  we  have  that  [L]  is  not  true, 
which  is  the  only  hyperproperty  that  is  both  hypersafety  and  hyperliveness.  So 
[L]  cannot  be  hypersafety.  Thus  [L]  is  a  hyperproperty  that  is  not  hypersafety 
but  is  subset  closed.  □ 

Theorem  4.2.  (V  S'  E  Sys.  K  E  KSHP(A;)  :  (3  K  E  SP  :  S  \=  K  «  Sk  \=  K)). 

Proof.  Let  K  be  an  arbitrary  k- safety  hyperproperty  of  system  S.  Our  strategy  is 
to  construct  a  safety  property  K  that  holds  of  system  Sk  exactly  when  K  holds 
of  S. 

Since  K  is  k- safety,  every  trace  property  not  contained  in  it  has  some  bad 
thing  of  size  at  most  k — that  is,  for  all  T  f  K,  there  exists  an  observation  M 
where  \M\  <  k  and  M  <  T,  such  that  for  all  V  where  M  <  T' ,  it  holds  that 
T'  f  K.  Construct  the  set  M  of  all  such  bad  things: 

M  =  {M  E  Obs  j  \M\  <  k  A  (3  T  E  Prop  :  T  f  K  A  M  <  T) 

A  (VT'  E  Prop  :  M  <T'  =►  T'  f  K)}. 

Next  we  define  some  notation  to  encode  a  set  of  traces  as  a  single  trace. 
Consider  a  trace  property  T  such  that  \T\  <  k.  Construct  a  finite  list  of  traces 
ti,t2,  ■  ■  ■  ,tk  such  that  ti  E  T  for  all  i.  Further,  we  require  that  no  t ,  is  equal  to 
any  for  any  i  and  /,  unless  \T\  <  k.  We  construct  a  trace  t  such  that  t\j]  is  the 
tuple  (f i  [j] ,  f2 \j\ ,  •  •  • ,  tk  [j] ) ;  note  that  t  is  a  trace  over  state  space  Let  trace  t  so 
constructed  from  T  be  denoted  zipk(T),  and  let  the  inverse  of  this  construction 
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be  denoted  unzip k(t);  note  that  zipk( ■)  and  unzip k(-)  are  partial  functions.  We  can 
also  apply  this  notation  to  observations,  which  are  finite  sets  of  finite  traces.22 

Now  we  can  construct  safety  property  K .  Let  K  be  the  set  of  traces  over  £fc 
such  that  no  trace  in  K  encodes  an  extension  of  any  bad  thing  M  6  M: 

K  =  {tk  |  -.(3  Me  Obs  :  M  e  M  A  zipk(M)  <  tk )}, 

where  tk  denotes  a  trace  t  over  space  S k. 

To  see  that  K  is  safety,  suppose  that  tk  ^  K.  Then  by  the  definition  of  K, 
there  must  exist  some  M  e  M  such  that  zvpk{M)  <  tk.  Consider  any  trace 
uk  >  zipk(M).  By  the  definition  of  K,  we  have  that  uk  K.  Thus,  for  any  trace 
tk  not  in  K,  there  is  some  finite  bad  thing  zipk(M),  such  that  no  extension  uk  of 
the  bad  thing  is  in  K.  By  definition,  K  is  therefore  safety. 

Finally,  we  need  to  show  that  S  satisfies  K  exactly  when  Sk  satisfies  K.  We 
do  so  by  mutual  implication. 

(=>)  Suppose  S  \=  K.  Then,  by  definition,  S  e  K.  For  sake  of  contradiction, 
suppose  that  Sk  %  K.  Then,  by  the  definition  of  subset,  there  exists  some 
tk  G  Sk  such  that  tk  K.  Let  T  be  unzip k(tk).  By  the  definition  of  K,  there 
must  exist  some  M  e  M  such  that  zipk(M )  <  tk.  Applying  unzip k(-)  to 
this  predicate,  and  noting  that  unzip  is  monotonic  with  respect  to  <,  we 
obtain  M  <  unzip k{tk).  By  the  definition  of  T,  we  then  have  that  M  <  T. 
By  the  construction  of  M,  T  therefore  cannot  be  in  K.  By  the  construction 
of  Sk  and  the  definition  of  T,  each  trace  in  T  must  also  be  a  trace  of  S. 
So  by  definition,  T  <  S.  By  transitivity,  we  have  that  M  <  S.  By  the 

22In  this  case,  the  tj  have  finite  and  potentially  differing  length.  So  if  j  >  \ti\,  let  tj{j]  —  JL 
for  some  new  state  _L  ^  E.  Thus,  zipk(T)  is  a  trace  over  state  space  (S  U  _L)fc.  We  redefine  trace 
prefix  <  over  this  space  to  ignore  J_:  let  t  <  t'  iff,  for  some  t"  that  is  a  trace  over  S,  ft~|  = 
where  \t  \  is  the  truncation  of  t  that  removes  any  _L  states.  For  notational  simplicity,  we  omit  this 
technicality  in  the  remainder  of  the  proof. 
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construction  of  M,  S  then  cannot  be  in  K.  But  this  contradicts  the  fact  that 
S  £  K.  Therefore,  Sk  C  K,  so  by  definition  Sk  |=  K. 

(<=)  Suppose  Sk  f=  K.  Then,  by  definition,  Sk  C  I\.  Suppose,  for  sake  of 
contradiction,  that  S  does  not  satisfy  K.  Then,  by  definition,  S  f  K.  Since 
K  is  fc-safety,  this  means  that  there  exists  an  M  <  S,  where  \M\  <  k, 
such  that  for  all  T'  >  M,  T'  f  K.  Let  mk  be  zipk(M),  and  let  sk  be  a 
trace  of  Sk  such  that  mk  <  sk  (such  a  trace  must  exist  since  M  <  S).  By 
the  construction  of  K,  for  any  tk  >  mk ,  we  have  that  tk  (f  K.  Therefore, 
•S' k  I\,  and  it  follows  that  Sk  K.  But  this  contradicts  the  fact  that 

Sk  C  K.  Therefore,  S  e  K,  so  by  definition  S  |=  K.  □ 

Proposition  4.2.  (VL  6  Prop  :  L  e  LP  [L\  e  LHP). 

Proof.  By  mutual  implication. 

(=>)  Let  L  be  an  arbitrary  liveness  property  We  want  to  show  that  [L\  is  a  live¬ 
ness  hyperproperty — that  is,  any  observation  M  can  be  extended  to  a  trace 
property  T  that  is  contained  in  [L\ .  So  let  M  be  an  arbitrary  observation. 
By  the  definition  of  liveness,  for  each  m  £  M,  there  exists  some  t  >  m  such 
that  t.  £  L.  For  a  given  m,  let  that  trace  t  be  denoted  tm.  Construct  the  set 
T  =  {JmeM{tm}-  Since  all  the  tm  are  elements  of  L,  we  have  T  C  L.  By  the 
definition  of  lifting,  it  follows  that  T  is  contained  in  [L] .  Further,  T  extends 
M  by  the  construction  of  T.  Thus,  T  satisfies  the  requirements  of  the  trace 
property  we  needed  to  construct.  By  definition,  [L]  is  hyperliveness. 

(<=)  Let  L  be  an  arbitrary  property  such  that  [L]  is  hyperliveness.  We  want  to 
show  that  L  is  liveness.  So  consider  an  arbitrary  trace  t,  and  let  T  =  {t}. 
Since  [L]  is  hyperliveness,  we  have  that  there  exists  a  V  such  that  T  <T' 
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and  T'  G  [L\.  Since  T  <T'  and  T  =  {£},  there  exists  a  t'  such  that  t  <  t'  and 
t'  G  T'r  by  the  definition  of  <.  By  the  definition  of  lifting,  if  t!  G  T'  G  [L],  it 
must  be  the  case  that  t'  G  L.  Thus,  for  any  t,  there  exists  a  t'  such  that  t  <  t' 
and  t'  G  L.  Therefore,  L  is  liveness,  by  definition.  □ 

Theorem  4.3.  PIF  C  LHP. 

Proof.  Let  P  be  an  arbitrary  possibilistic  information-flow  hyperproperty,  and 
let  Clp  be  the  closure  operator  that  Mantel  [78]  would  associate  with  P.23  Then, 
by  Mantel's  Definition  10,  it  must  be  the  case  that  P  =  { ClP(T )  |  T  G  Prop}. 
Closure  operators  must  satisfy  the  axiom  (V X  :  X  C  C'l(X)),  which  we  use 
below. 

To  show  that  P  is  hyperliveness,  let  T  G  Obs  be  arbitrary.  By  the  definition 
of  hyperliveness,  we  need  to  show  that  there  exists  aT'  G  Prop  such  that  T  <T' 
and  V  G  P .  Let  V  be  Clp(T),  where  T  denotes  the  embedding  of  T  into  Prop  by 
infinitely  stuttering  the  final  state  of  each  trace  in  T,  as  discussed  in  §4.1.  By  the 
closure  axiom  above,  we  have  that  T  C  Clp(T).  So  by  the  definition  of  <,  we  can 
conclude  T  <  Clp(T )  =  V .  Further,  T'  must  be  an  element  of  P  since  it  is  the 
Cl p-closure  of  trace  property  T.  Therefore,  T'  satisfies  the  required  conditions, 
and  P  is  hyperliveness. 

To  see  that  the  subset  relation  is  strict,  consider  liveness  property  GS  (guar¬ 
anteed  service)  from  §4.1.  It  corresponds  to  liveness  hyperproperty  [G'S'],  but 
has  no  corresponding  closure  operator.  For  suppose  that  such  a  closure  opera¬ 
tor  did  exist,  and  consider  an  infinite  trace  t  in  which  service  fails  to  occur.  The 
closure  of  any  set  containing  t  must  still  contain  t,  by  the  axiom  above.  But  then 

23More  precisely.  Mantel  argues  that  every  "possibilistic  information-flow  property  [sic]"  can 
be  expressed  as  a  basic  security  predicate,  and  that  each  basic  security  predicate  induces  a  set  of 
closure  operators.  Any  element  of  this  set  suffices  to  instantiate  Clp.  Also,  Mantel's  closure 
operators  were  over  finite  traces,  and  we  have  generalized  to  infinite  traces. 
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the  closure  does  not  satisfy  GS,  and  so  the  closure  operator  cannot  correspond 

to  [GS].  □ 

Proposition  4.3.  Ob  =  Osb . 

Proof.  By  mutual  containment. 

(D)  By  definition,  the  elements  of  Ob  are  finite  intersections  of  elements  of 
Osb  .  Thus,  every  element  of  Osb  is  already  trivially  an  element  of  Ob . 

(C)  Let  N  be  an  arbitrary  element  of  Ob .  By  the  definition  of  a  base,  we  can 
write  N  as  j  Mu  where  i  ranges  over  a  finite  index  set  and  each  Mi 
is  an  observation.  We  want  to  show  that  there  exists  an  element  j  N  of 
OsB  such  that  N  = j  N.  So  consider  N.  Every  trace  property  T  in  it  must 
extend  every  Mt.  Thus,  by  the  definition  of  <,  every  such  trace  property 
T  extends  |J(  Mt.  Therefore  N  =|  (J!;  M%.  Our  desired  observation  N  is 
thus  IJ.  Mi.  Note  that,  for  N  to  be  a  valid  observation,  it  must  be  a  finite 
set.  The  union  over  Ml  must  therefore  result  in  a  finite  set — which  it  does, 
since  i  ranges  over  a  finite  index  set.  □ 

Proposition  4.4.  SHP  =  C. 

Proof.  By  mutual  containment. 

(C)  Let  S  be  an  arbitrary  safety  hyperproperty.  We  need  to  show  that  it  is  also 
a  closed  set.  By  the  definition  of  closed,  this  is  equivalent  to  showing  that 
S  is  the  complement  of  an  open  set.  Our  strategy  is  to  construct  hyperpro¬ 
perty  O,  show  that  O  and  S  are  equal,  and  show  that  O  is  open. 

By  the  definition  of  hypersafety,  we  have  that  any  trace  property  T  that  is 
not  a  member  of  S — and  thus  is  a  member  of  S — must  contain  some  bad 
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thing.  Consider  the  set  M  G  'P(Obs)  of  all  bad  things  for  S.  M  contains 
one  or  more  elements  for  every  trace  property  in  S : 

JVf  =  {M  G  Obs  I  (3  T  G  S  :  M  <  T 

A  (VT'  G  Prop  :  M  <T  =►  T'  G  S))}. 

Next,  define  O  as  the  completion  of  M — that  is,  the  set  of  all  trace  proper¬ 
ties  that  extend  a  bad  thing  for  S : 

O  =  |J  ]M 

MeM 

=  {T\(3M  eM  :  M  <T)},  (4.A.1) 

where  the  equality  follows  by  the  definition  of  |  M.  Since  each  such  trace 
property  T  violates  S,  we  would  suspect  that  O  is  the  complement  of  S. 
This  is  indeed  the  case: 

Claim.  O  =  S 

Proof.  (By  mutual  containment.) 

(C)  Suppose  T  G  O.  Then  by  equation  4.A.1,  there  is  some  M  G 
M  such  that  M  <  T.  By  the  definition  of  JVf,  any  extension 
of  M  is  an  element  of  S .  Since  T  is  such  an  extension,  T  G  S . 

Q)  Suppose  T  G  S.  Then  T  f  S,  so  by  the  definition  of  hy¬ 
persafety,  (3  M  G  Obs  :  M  <  T  A  (VT'  G  Prop  :  M  < 

V  ==>•  V  f  S)).  Consider  that  M.  It  must  be  a  member 
of  M,  by  definition.  Since  M  <  T,  we  have  that  T  G  O  by 
equation  4.  A.  1.  □ 

All  that  remains  is  to  show  that  O  is  open.  First,  note  that  |  M,  for  any 
M  G  Obs,  is  by  definition  an  element  of  Osb .  Thus  each  of  the  sets  ]  M  in 
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the  definition  of  O  is  open.  Second,  by  the  definition  of  open  sets,  a  union 
of  open  sets  is  open.  O  is  such  a  union,  and  is  therefore  open. 

Q)  Let  C  be  an  arbitrary  closed  set.  We  need  to  show  that  it  is  also  hyper¬ 
safety.  Our  strategy  is  to  identify,  for  any  trace  property  T  not  in  C,  a  bad 
thing  for  T.  If  such  a  bad  thing  exists  for  all  T,  then  C  is  by  definition 
hypersafety. 

Since  C  is  closed,  it  is  by  definition  the  complement  of  an  open  set.  By 
proposition  4.3,  we  can  therefore  write  C  as  follows: 

C  =  U  '[Mi,  (4.A.2) 

i 

where  each  Mt  is  an  observation. 

Let  T  be  an  arbitrary  trace  property  such  that  T  £  C,  or  equivalently, 
such  that  T  e  C.  Then  T  must  be  in  at  least  one  of  the  infinite  unions  in 
equation  4.  A.2.  Thus,  there  must  exist  an  i  such  that 

T  6|Mi  and  =  {U  e  Prop  [  Mt  <  U},  (4.A.3) 

where  the  equality  follows  from  the  definition  of  j. 

We  construct  the  bad  thing  M  for  T  by  defining: 

M  =  Mi. 

We  have  that  M  <  T,  because  of  equation  4.  A.3. 

To  show  that  M  is  a  bad  thing  for  T,  consider  any  T'  >  M.  By  the  def¬ 
inition  of  M ,  V  >  Mi.  By  equation  4.A.3,  it  follows  that  T' ,  like  T,  is  a 
member  of  ]  Mi.  By  equation  4.A.2,  V  e  C.  Therefore,  V  C. 

We  have  now  shown  that  for  any  T  C,  there  exists  an  M  <  T,  such  that 
for  all  T'  >  M,  T'  ^  C.  Thus  C  is  hypersafety,  by  definition.  □ 
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Proposition  4.5.  LHP  =  T>. 


Proof.  By  mutual  containment. 

(C)  Let  L  be  an  arbitrary  liveness  hyperproperty.  We  need  to  show  that  L  is 
dense.  By  the  definition  of  dense,  we  must  therefore  show  that  L  intersects 
every  non-empty  open  set.  So  let  O  be  an  arbitrary  non-empty  open  set. 
We  need  to  show  that  L  fl  O  is  non-empty.  By  proposition  4.3  and  the 
definition  of  open,  we  can  write  O  as  (J.  j  M%.  Consider  an  arbitrary  Mt. 
Since  L  is  hyperliveness,  there  exists  a  T  >  Mj  such  that  T  e  L .  Further, 
by  the  definition  of  |,  we  have  that  T  e  O.  Therefore,  T  e  L  fl  O,  and  it 
follows  that  L  is  dense,  by  definition. 

(D)  Let  D  be  an  arbitrary  dense  set.  To  show  that  D  is  hyperliveness,  we 
must  show  that  any  observation  T  can  be  extended  to  a  trace  property  T' 
contained  in  D — that  is,  (VT  e  Obs  :  (3  T'  e  Prop  :  T  <  T'  A  T'  e  D)). 
So  let  T  be  an  arbitrary  observation.  Let  Ot  be  the  completion  of  T: 

Ot  =  ]T 

=  {V  e  Prop  |  T  <  T}  (4.A.4) 

Ot  is  an  element  of  ,  the  subbase  of  our  topology,  by  definition.  Thus, 
by  the  definition  of  a  subbase,  Ot  is  an  open  set.  By  the  definition  of  a 
dense  set  (which  is  that  a  dense  set  intersects  every  open  set),  we  therefore 
have  that  O-rflD  f  0.  Let  V  be  any  element  in  the  set  Ot  H  D.  By 
equation  4.A.4,  we  have  T  <  T' . 

We  have  now  shown  that,  for  an  arbitrary  observation  T,  there  exists  a 
trace  property  T'  such  that  T  <T'  and  T'  6  D .  Therefore,  D  is  hyperlive¬ 
ness,  by  definition.  □ 


151 


Theorem  4.4.  O  =  %}L(0). 


Proof.  By  mutual  containment. 

(C)  Suppose  O  G  O.  By  the  definitions  of  a  base  and  of  O,  we  can  write  O 
as  1J°°  |  Mu  where  each  Mi  is  an  element  of  Obs.24  Now  we  calculate: 

urt  Mi 

=  (  definition  of  | ) 

U ?{T  |  T  >  Mi} 

=  (  definition  of  <  ) 

U TiT  I  G  Mi  :  (3t  G  T  :  <  t))} 

=  (  definition  of  | ) 

UnT  I  (V*  mi:j  G  Mi  :fmij  n  T  f  0)} 

=  (  definition  of  (•)  ) 

U°°{T  !  (V*  m^  GM4  :  T  G  (T^))} 

=  (  definition  of  fl  ) 

urn;<Tmy> 

Since  j  G  C9 /;  by  definition,  and  C>/;  C  C9  by  the  definition  of  base,  we 
have  that  (f  rntf)  G  PjffO).  Thus,  by  the  definition  of  subbase,  lj°°  P|*(| 
rriij)  G  Therefore,  by  the  calculation  above,  we  can  conclude  O  G 

24We  decorate  quantifiers  with  oo  and  *  to  denote  an  infinite  and  finite  range,  respectively. 
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Q)  Suppose  O  G  TJ l(0).  By  the  definition  of  subbase  and  TJL,  we  can  write 
O  as  1J°°  Pi *j(Oij),  where  each  0l3  is  an  element  of  O.  Now  we  calculate: 


urn;(o«) 

=  (  definition  of  (•)  ) 

U°°  fTjiT  I  T  n  Oij  ±  0} 

Since  0l3  is  open  in  the  base  topology  O,  it  can  be  rewritten  a  union  of  base 
open  sets  |  t.,3k,  where  each  t,l]k  is  a  finite  trace: 

OO 

Oi3  =  U  T  Ujk- 

k 

We  continue  calculating: 

=  (  rewriting  ()l3  ) 

U°°  I  T  n  (Ur  T  tijk)  ^  0} 

=  (  set  theory  ) 

in^l  (V*i  :  (3°° k  :  rntW0))} 

=  (  definition  of  <  ) 

ur^l  (V*i  :  (3°°A;  :  fe}  <  T))} 

=  (  set  theory;  let  k'  be  the  k  guaranteed  to  exist  for  i  and  j  ) 

UT{T\  U ]Ujk'<T} 

=  ( let  M,  =  [J*  tijk'}  definition  of  | ) 

ur  t  Mi 

Finally,  since  Ml  is  a  finite  set  of  finite  traces,  it  is  an  element  of  Obs.  So 
by  definition,  |  Mr  e  O  ’11 .  Thus  by  the  definition  of  base,  |J.X'  |  M,  e  O. 
Therefore,  by  the  calculation  above,  we  can  conclude  O  G  O.  □ 
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Proposition  4.6.  SHP=C7c({[S]|S,eSP}). 

Proof.  Let  S  be  an  arbitrary  safety  hyperproperty.  By  proposition  4.4,  S  is  a 
closed  set  in  topology  O.  By  theorem  4.4,  S  is  thus  also  a  closed  set  in  topol¬ 
ogy  QJ l(0).  By  the  definition  of  closed,  S  is  the  complement  of  an  open  set  in 
topology  5} l(0).  By  the  definition  of  a  base,  we  can  thus  write  S  as  unions  of 
intersections  of  base  elements.  Letting  ~  denote  set  complement,  we  calculate: 

S 

=  (  definition  of  base  ) 

urn*  (On) 

=  (  definition  of  (•)  ) 

urn;{riTnoy#0} 

=  (  double  negation  ) 

~~um;{r  |rno„  ^0} 

=  (  set  theory  ) 

~n”U'{r|Tnoy  =  0} 

=  (  set  theory  ) 

-nru*{T\Tc  o;,} 

=  (  definition  of  [•]  ) 

~nru;  m 

Removing  a  complement  from  each  side  of  the  above  equation,  we  obtain 

OO  * 

s  =  nu  ra- 

i  3 
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Since  each  Otj  is  open  in  topology  O,  we  have  that  0^  is  closed  in  O.  By  the  fact 
that  closed  sets  in  O  correspond  to  safety  properties  [5],  Ol3  is  a  safety  property 
Therefore,  S  is  the  infinite  intersection  of  finite  unions  of  safety  properties,  and 
by  definition  of  Clc  must  be  an  element  of  Clc({[S]  \  S  G  SP}). 

Similarly,  given  an  arbitrary  element  of  Clc({[S]  \  S  G  SP}),  the  same  rea¬ 
soning  used  above  establishes  that  it  is  also  an  element  of  SHP.  Therefore,  by 
mutual  containment,  the  two  sets  are  equal.  □ 

Theorem  4.5.  (VP  G  HP  :  (3  S  G  SHP,  L  G  LHP  :  P  =  S  n  I)). 

Proof.  This  theorem  can  be  easily  proved  by  adapting  either  the  logical  [105] 
or  topological  [5]  proof  of  the  intersection  theorem  for  trace  properties.  The 
domains  involved  are  merely  upgraded  to  include  an  additional  level  of  sets. 
Here  we  take  the  former  approach  and  rehearse  the  logical  proof. 

Our  strategy  is  as  follows.  Given  hyperproperty  P,  we  construct  safety  hy¬ 
perproperty  S  that  contains  P  as  a  subset.  We  also  construct  liveness  hyperpro¬ 
perty  L  that  contains  P.  The  intersection  of  S  and  L  then  necessarily  contains 
P,  and  we  shall  show  that  the  intersection  is,  in  fact,  exactly  P. 

To  construct  S,  we  define  the  safety  hyperproperty  Safe(P),  which  stipulates 
that  the  hyperliveness  of  P  is  never  violated.  A  bad  thing  for  this  safety  hyper¬ 
property  is  any  set  of  traces  that  cannot  be  extended  to  satisfy  P.  So  we  require 
that  Safe(P)  contains  only  sets  T  of  traces  such  that  any  observation  of  T  can  be 
extended  to  satisfy  P.  Formally, 

Safe(P)  =  (T  G  Prop  |  (VM  G  Obs  :  M  <  T 

==►  (3  Tg  Prop  :  M  <  T  A  T  G  P))}. 
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It  is  straightforward  to  establish  that  Safe(P)  is  hypersafety:  Any  set  T  not  con¬ 
tained  in  Safe(P)  must  satisfy  the  negation  of  the  predicate  in  the  above  defini¬ 
tion  of  Safe(P) — that  is,  (3  M  e  Obs  :  M  <T  A  (VT'  6  Prop  :  M  <  T'  =>• 
T'  ^  P)  ).  If  no  extension  of  M  can  be  in  P,  then  no  extension  V  of  M  can  be  in 
Safe(P)  because  the  hyperliveness  of  P  would  be  violated  in  T'  at  observation 
M.  So 

(Vf  G  Prop  :  M  <T  ==►  T  £  P) 

=>  (VT'  e  Prop  :  M  <T  ==►  T  £  Safe{P)).  (4.A.5) 

Thus,  by  monotonicity,  (3 M  e  Obs  :  M  <T  A  (VT'  e  Prop  :  M  <  V  ==>• 
T'  c{  Safe(P))).  Therefore  Safe(P)  is  hypersafety. 

Similarly,  to  construct  L,  we  define  the  liveness  hyperproperty  Live(P), 
which  stipulates  that  it  is  always  possible  either  to  satisfy  P  or  to  become  im¬ 
possible,  due  to  some  bad  thing,  to  satisfy  P.  In  the  latter  case,  a  safety  hyper¬ 
property  has  been  violated — namely,  Safe(P).  Formally, 

Live(P)  =  P  U  Safe(P), 

where  H  denotes  the  complement  of  hyperproperty  H  with  respect  to  Prop.  To 
show  that  Live(P)  is  hyperliveness,  consider  any  observation  T.  Suppose  that 
T  can  be  extended  to  some  trace  property  V  such  that  T'  e  P.  Then  T'  is  also 
in  Live(P),  so  Live(P)  is  hyperliveness  for  T.  On  the  other  hand,  if  T  cannot 
be  extended  to  satisfy  P,  then  T  is  a  bad  thing  for  Safe(P) — that  is,  (VT'  e 
Prop  :  T  <  V  =>•  V  £  P ).  Let  T'  be  an  arbitrary  extension  of  T.  By  the 
same  reasoning  as  equation  (4.A.5),  T'  is  not  in  Safe(P).  Therefore  T'  must  be  in 
Safe(P).  Thus,  Live(P)  is  again  hyperliveness  for  T.  We  conclude  that  Live(P) 
is  hyperliveness. 
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Next,  note  that  P  C  Safe(P),  because  any  element  T  of  P  satisfies  the  def¬ 
inition  of  Safe(P).  In  particular,  for  any  M  <  T,  there  is  a  V  >  M  such  that 
V  G  P— namely,  T  =  T.  Thus,  Safe{P )  =  P  U  Safe(P). 

Finally,  let  S  =  Safe(P)  and  L  =  Live(P),  and  we  prove  the  theorem  by 
simple  set  manipulation: 

SHL  =  Safe(P)  (T  Live(P) 

=  (P  U  Safe(P))  n  (P  U  Safe{P)) 

=  P  n  ( Safe{P )  U  Safe(P)) 

=  Pfl  Prop 

=  P  □ 
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CHAPTER  5 


FORMALIZATION  OF  SYSTEM  REPRESENTATIONS 

Security  policies  are  properties  of  systems,  meaning  that  a  system  either  does 
or  does  not  satisfy  a  security  policy  Chapter  4  models  systems  (and  their 
executions)  with  trace  sets.  Some  models  of  system  execution  are  expressed 
with  other  mathematical  formalisms — for  example,  relational  semantics,  la¬ 
beled  transition  systems,  and  state  machines.  And  probability  can  be  used  with 
each  of  these  formalisms  to  model  random  behaviors  of  systems.  Chapter  4 
mentions  some  of  these  formalisms  but  does  not  make  them  precise. 

For  example,  recall  noninterference  stipulates  that  commands  executed  on  be¬ 
half  of  users  holding  high  clearances  have  no  effect  on  system  behavior  ob¬ 
served  by  users  holding  low  clearances.  Goguen  and  Meseguer's  definition 
of  noninterference  [46]  models  system  behavior  with  state  machines,  whereas 
our  definition  GMNI  (4.1.4),  repeated  below,  assumes  an  encoding  of  state  ma¬ 
chines  as  trace  sets  and  requires  a  trace  set  T  to  contain,  for  any  trace  t,  a  cor¬ 
responding  trace  t'  with  no  high  input  events  yet  with  the  same  low  input  and 
output  events  as  t: 

GMNI  =  {T  e  Prop  |  T  G  SM 

A  (Vf  G  T  :  (3t'  e  T  :  evninf)  =  e 

A  evL{t)  =  evL(t')))}. 

Conjunct  T  e  SM  expresses  the  requirement  that  trace  set  T  encodes  a  state 
machine,  but  we  have  not  yet  defined  set  SM  (we  shall  in  §5.4).  Nor  have  we 
classified  GMNI  as  hypersafety  or  hyperliveness. 

It  is  reasonable  to  expect  that  GMNI  is  hypersafety;  the  bad  thing  should 
be  a  set  {t,  t'}  of  finite  traces  where  t'  contains  no  high  inputs  and  contains  the 
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same  low  inputs  as  t,  yet  t  and  t'  have  different  low  outputs.  But  GMNI  fails 
to  be  hypersafety  because  of  a  technicality  Goguen  and  Meseguer's  state  ma¬ 
chines  must  be  deterministic,  so  SM  must  exclude  all  trace  sets  that  exhibit 
nondeterminism.  Thus  a  system  T  might  fail  to  satisfy  GMNI  only  because 
T  is  nondeterministic,  in  which  case  a  deterministic,  non-interfering  observa¬ 
tion  of  T  would  be  remediable — hence  GMNI  would  not  be  hypersafety.1  The 
problem  is  that  the  definition  of  hypersafety,  by  quantifying  over  Prop,  assumed 
that  systems  are  allowed  to  be  nondeterministic.  Now  that  we  are  interested  in 
state  machines,  our  definitions  of  hypersafety  and  hyperliveness  should  quan¬ 
tify  over  only  those  trace  sets  that  encode  state  machines.  And  in  general,  those 
definitions  should  be  parameterized  on  a  system  representation. 

This  chapter  proceeds  as  follows.  The  definitions  of  hypersafety  and  hy¬ 
perliveness  are  generalized  in  §5.1  to  account  for  system  representations.  Hy¬ 
perproperties  for  relational  systems,  labeled  transition  systems,  state  machines, 
and  probabilistic  systems  are  presented  in  §5.2,  §5.3,  §5.4,  and  §5.5.  The  technical 
results  of  chapter  4  are  generalized  in  §5.6  to  account  for  system  representations, 
and  §5.7  concludes. 

5.1  Generalized  Hypersafety  and  Hyperliveness 

Chapter  4  assumed  a  particular  system  representation — namely.  Prop,  the  set  of 
all  trace  sets.  Now,  let  Rep  be  a  set  of  trace  sets  that  encodes  a  system  represen¬ 
tation.  For  example,  each  set  in  Rep  might  encode  a  state  machine.  Note  that 
Rep  is  a  subset  of  Prop. 

1A  similar  problem  would  occur  even  if  we  used  implication  instead  of  conjunction  in  the 
definition  of  GMNI  to  formalize  the  requirement  that  systems  be  (deterministic)  state  machines: 
any  observation  could  be  remediated  by  adding  traces  that  represent  nondeterministic  transi¬ 
tions  of  the  state  machine. 
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Recall  that  Obs  is  the  set  of  observations  of  Prop,  and  that  an  observation  is  a 
finite  set  of  finite  traces.  We  now  need  to  define  the  set  of  observations  of  Rep. 
Let  Ofrs(Rep)  denote  the  subset  of  Obs  containing  observations  of  Rep,  where 

06s(Rep)  =  {M  G  Obs  |  (3T  G  Rep  :  M  <  T)}. 

Note  that  0&s(Rep)  is  simply  Obs  if  Rep  equals  Prop. 

Now  we  can  define  hypersafety  and  hyperliveness  for  a  given  system  repre¬ 
sentation. 

Definition  5.1.  A  hyperproperty  S  is  a  safety  hyper  property  for  system  representa¬ 
tion  Rep  (is  ln/persafety  for  Rep)  iff 

(VT  G  Rep  :  T  £  S  ==►  (3  M  G  Obs( Rep)  :  M  <T 

A  (VT'  e  Rep  :  M  <T  =►  T'  f  S))). 

Definition  5.2.  Hyperproperty  L  is  a  liveness  hyperproperty  for  system  representa¬ 
tion  Rep  (is  hyperliveness  for  Rep)  iff 

(VT  G  Ofrs(Rep)  :  (3T'  G  Rep  :  T  <  T  A  T  G  L)). 

Note  that  both  definitions  simplify  to  the  original  definitions  of  hypersafety  and 
hyperliveness  in  chapter  4  if  Rep  equals  Prop.  We  now  demonstrate  the  use  of 
these  generalized  definitions  with  several  system  representations. 

5.2  Relational  Systems 

In  language-based  information-flow  security  [104],  a  program  P  is  sometimes 
modeled  (e.g.,  with  large-step  operational  semantics)  as  a  relation  f  such  that 
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(. P ,  s)  fj.  s'  if  P  begun  in  initial  state  s  terminates  in  final  state  s'.  Using  this 
relation,  noninterference  can  be  stated  as 

-Si  =L  S2  A  (P,s i)  si  A  (P,  S2)  4  s'2  si  =L  s'2, 

where  relation  =L  (c.f.  observational  determinism  OD  (4.1.6))  determines  which 
states  are  low-equivalent.  This  statement  of  noninterference  is  termination  insen¬ 
sitive  because  it  allows  information  to  leak  through  termination  channels. 

To  model  a  program  P  as  set  T  of  traces,  intuitively,  imagine  that  an  observer 
of  the  program  periodically  checks  to  see  in  what  state  the  program  is.  If  P  be¬ 
gun  in  initial  state  s  never  terminates,  the  observer  will  see  an  infinite  sequence 
containing  only  s.  If  P  does  terminate  in  final  state  s',  the  observer  will  see  a 
finite  sequence  of  s  followed  by  an  infinite  sequence  of  s'.  Let  T  be  the  set  of  all 
such  traces.  Formally,  T  is  defined  as  follows: 

T  =  {te  Tinf  (P,s)|s'Aie  s+(sT} 

U  {t  E  Tjnf  j  -i(3  s'  :  (P,  s)  1)  s')  A  t  =  s^}. 

Let  Rel,  the  set  of  all  relational  systems,  be  the  set  of  all  trace  sets  so  constructed 
for  any  P. 

Define  termination-insensitive  relational  noninterference  as  a  hyperproperty: 

TIRNI  =  {T  e  Prop  |  T  e  Rel 

A  (V  f  i ,  t2  €  T  :  ti[0]  —l  U[0] 

divergestfi )  V  diverges(t2 ) 

V  (3si,s2  £  £  :  terminates (t i ,  sf) 

A  terminates (t-2,  s2)  A  -Si  —l  s2))}-  (5.2.1) 

Predicate  diverges  (t)  holds  whenever  t  is  a  trace  of  a  program  P  such  that  P  does 
not  terminate  when  begun  in  initial  state  t[ 0],  so  t  —  (f  [0])“.  Similarly,  predicate 
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terminates  (t,  s)  holds  whenever  P  terminates  in  final  state  s  when  begun  in  ini¬ 
tial  state  t[0],  so  t  =  (£[0])+sw.  We  assume  without  loss  of  generality  that  final 
states  are  distinguishable  from  initial  states  (e.g.,  by  having  a  special  flag  set), 
so  that  diverges  and  terminates  can  distinguish  between  nontermination  and  ter¬ 
mination  in  a  final  state  that  otherwise  is  identical  to  an  initial  state.  TIRNI  is 
hypersafety  for  Rel :  the  bad  thing  is  a  pair  of  traces  that  begin  in  low-equivalent 
initial  states  but  terminate  in  final  states  that  are  not  low-equivalent. 

Termination-sensitive  noninterference  is  the  same  as  termination  insensitive, 
except  that  it  forbids  one  trace  to  diverge  and  the  other  to  terminate.  So  define 
termination-sensitive  relational  noninterference  as  follows: 

TSRNI  =  {T  G  Prop  |  T  e  Rel 

A  (Vti,t2  £  T  :  fx[0]  =L  t2[0] 

==>•  ( diverges{t\ )  A  diverges (t2)) 

V  (3si,s2  €  E  :  terminates (t i ,  si) 

A  terminates (t-2,  s2)  A  Si  s2))}.  (5.2.2) 

Note  that  the  only  change  is  that  a  disjunction  became  a  conjunction.  TSRNI  is 
neither  hypersafety  nor  hyperliveness  for  Rel.  To  see  that  it  is  not  hypersafety 
for  Rel,  consider  a  system  containing  a  pair  {t,t'}  of  traces,  where  t  diverges 
and  t'  does  not,  yet  where  t  and  t1  contain  low-equivalent  initial  states,  does 
not  satisfy  TSRNI.  But  any  finite  prefix  of  this  pair  could  be  remediated  by 
extending  the  prefix  of  t  to  terminate  in  the  same  final  state  as  t1.  Tikewise, 
to  see  that  TSRNI  is  not  hyperliveness  for  Rel,2  consider  a  finite  observation 

2Terauchi  and  Aiken  [115]  characterized  termination-sensitive  noninterference  as  "2- 
liveness,"  where  they  defined  "2-liveness"  as  a  "property  which  may  observe  up  to  two  possibly 
infinite  traces  to  refute  the  property"  Although  they  are  correct  that  TSRNI  could  be  refuted 
by  observing  two  infinite  traces,  refutation  is  really  about  safety,  not  liveness — there  is  no  good 
thing  for  TSRNI,  but  there  is  an  infinitely-observable  bad  thing.  So  "2-infinite-safety"  would 
be  a  better  term  than  "2-liveness." 
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containing  a  pair  of  terminating  traces  that  have  low-equivalent  initial  states 
but  not  low-equivalent  final  states.  This  observation  cannot  be  extended  to  be 

in  TSRNI. 


5.3  Labeled  Transition  Systems 


Definitions  of  noninterference  are  sometimes  based  on  bisimulation,  which  is  a 
relation  that  specifies  whether  two  systems  are  equivalent  to  an  observer.  Bisim¬ 
ulations  are  often  expressed  over  labeled  transition  systems,  which  are  triples 
(S,L,  — >)  where  S'  is  a  set  of  LTS-states,3  L  is  a  set  of  labels,  and  — >  is  a  rela- 

i 

tion  on  S'  x  L  x  S'  [90].  Elements  of  relation  — >  are  usually  notated  .si  — >  so  and 
are  interpreted  to  mean  that  the  system  has  a  transition  labeled  i  from  LTS-state 
si  to  LTS-state  s2. 

A  labeled  transition  system  (S',  L,  — >)  can  be  encoded  as  a  set  of  traces.  Define 
the  state  space  E  for  the  traces  to  be  S'  x  L.4  Given  state  s  G  E,  let  st(s)  denote  the 
LTS-state  from  s,  and  let  lab(s)  denote  the  label  from  s.  Define  traces(S,  L,  — >)  to 
be 


{t  |  (Vi  G  N  :  st(t[i\)  lab^')  st(t[i  +  l]))}.5 


Let  LTS  be  the  set  of  all  trace  sets  so  constructed  for  any  LTS. 


Bismulation  nondeducibility  on  compositions.  We  now  demonstrate  how  to 
use  this  encoding  by  formalizing  Locardi  and  Gorrieri's  [44]  definition  of  bisim- 
ulation  nondeducibility  on  compositions  (BNDC),  which  is  a  noninterference  pol- 

3 We  use  the  term  LTS-state  to  distinguish  these  from  the  states  defined  in  §4.1. 

This  construction  would  not  work  with  an  impoverished  notion  of  state,  as  observed  by 
Focardi  and  Gorrieri  [44]  for  states  that  are  elements  only  of  L. 

5 We  could  replace  lab{t[i\)  with  lab(t[i  +  1])  in  this  definition;  the  choice  of  where  to  store  the 
label  is  arbitrary 
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icy  for  nondeterministic  LTSs.  The  intuition  behind  this  policy  is  that  a  system 
should  appear  the  same  to  a  low  observer  no  matter  with  what  other  system 
it  is  composed  (i.e.,  run  in  parallel).  Assume  that  set  L  of  labels  can  be  par¬ 
titioned  into  three  sets  of  actions  (i.e.,  events):  a  set  of  low  security  actions,  a 
set  H  of  high  security  actions,  and  {r},  where  r  is  an  unobservable  internal  ac¬ 
tion.  An  LTS  E  =  ( S ,  L,  — >)  satisfies  BNDC,  denoted  BNDC(E),  iff  for  all  LTSs 
F  —  (S,  H  U  {t},— >f)  that  take  only  high  and  internal  actions, 

E/H  «  (E\F)  \  H, 

with  notations  /,  |,  \,  and  «  informally  defined  as  follows:6 

•  Hiding  operator  E/H  relabels  as  r  all  actions  from  H  that  occur  during 
execution  of  E.  System  E/H  thus  represents  the  view  of  system  E  by  a 
low  observer,  since  all  the  high  actions  are  hidden. 

•  Parallel  composition  operator  E\F  denotes  the  interleaving  of  systems  E 
and  F.  The  systems  can  synchronize  on  actions,  causing  the  composed 
system  to  emit  internal  action  r. 

•  Restriction  operator  E\H  prohibits  the  occurrence  of  any  actions  from  H 
during  execution  of  E,  meaning  that  no  transition  with  a  label  from  H  is 
allowed.  System  ( E\F )  \  H  thus  represents  a  low  observer's  view  of  E 
when  all  the  high  actions  that  E  takes  are  synchronized  with  F. 

•  Weak  bisimulation  relation  E  k,  F  intuitively  means  that  E  and  F  can 
simulate  each  other:  if  E  can  take  a  transition  with  label  l,  then  there  must 
exist  a  transition  of  F  that  is  also  labeled  £,  and  after  taking  those  transi¬ 
tions  E  and  F  must  remain  bisimilar.  F  is  allowed  to  take  any  number 

6The  formal  definitions  (over  LTSs)  are  standard  and  given  by  Focardi  and  Gorrieri  [44].  It  is 
straightforward  to  define  them  directly  over  trace  sets. 
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of  internal  transitions  (labeled  r)  before  or  after  the  ^-labeled  transition. 
Further,  the  relation  must  be  symmetric,  such  that  if  E  ~  F  then  F  E. 

Thus,  if  E/H  ~  (E\F)\H ,  a  low  observer's  view  of  E  does  not  change  when  E  is 
composed  with  any  high  security  system  F.  The  hyperproperty  corresponding 
to  Focardi  and  Gorrieri's  BNDC  is 

BNDC  =  {T  e  Prop  |  T  e  LTS 

A  (3E  E  LTS  :  T  =  traces(E) 

A  BNDC(E))}.  (5.3.1) 

BNDC  is  hyperliveness  for  LTS  because  of  the  existential  in  definition  of 
any  observation  can  be  remedied  by  adding  additional  transitions.  This  remedi¬ 
ation  corresponds  to  a  closure  operator  because  it  only  adds  traces,  thus  BNDC 
is  a  possibilistic-information  flow  policy. 

Boudol  and  Castellani's  noninterference.  Boudol  and  Castellani  [18]  define  a 
bisimulation-based  noninterference  policy  for  concurrent  programs.  To  model 
this  policy  as  a  hyperproperty,  we  first  formalize  their  model  of  program  ex¬ 
ecution.  They  model  execution  as  a  binary  relation  — >  on  program  terms  and 
memories;  a  program  term  P  and  a  memory  //  step  to  a  new  program  term  P' 
and  memory  //.  Define  the  set  Sp  of  states  for  program  P  to  be  the  set  of  pairs 
of  a  program  term  and  a  memory,  prog(s)  to  be  the  program  term  from  state  s, 
and  mem(s)  to  be  the  memory  from  state  s.  Define  traces (P)  to  be  the  set  of  all 
traces  t  such  that  prog(t[ 0])  is  P,  and  for  all  i,  t[i]  — >  t[i  +  1].  This  construction 
encodes  P  as  a  set  of  traces  and  is  an  instance  of  our  general  construction  for 
encoding  LTSs  (c.f.  §5.3);  here  there  are  only  LTS-states  and  no  labels. 
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Second,  we  formalize  Boudol  and  Castellani's  security  policy.  Let  —i  be 
an  equivalence  relation  on  memories  such  that  /i\  —L  p2  means  // 1  and  p2  are 
indistinguishable  to  a  low  observer.  State  s  can  step  to  state  s'  in  program  P, 
denoted  steps P(s,  s’),  if 

(3  f  G  \Linf , i  G  N  :  t  G  traces(P)  A  t[i]  —  s  A  t[i  +  1]  =  s'). 

Define  (read  "bisimilar")  to  be  a  binary  relation  on  Sp  such  that  if  si  is  bisim¬ 
ilar  to  s2,  then  si  and  s2  must  have  indistinguishable  memories  to  a  low  ob¬ 
server;  further,  if  s i  can  step  to  state  s\ ,  then  either  s[  is  bisimilar  to  s2,  or  s2  can 
step  to  s2  where  s',  and  s2  are  bisimilar.  Formally,  is  the  largest  symmetric 
binary  relation  on  Sp  such  that 

Si  s2  =>-  mem(si)  =p  mem(s2 ) 

A  (3  s[  G  E  :  steps P(si,  s[)  ==>•  s[  s2 

V  (3s'2  G  S  :  stepsP(s2,  s2)  A  s[  s'2)). 

Relation  formalizes  Definition  3.5  ((r,  £)-Bisimulation)  from  [18]. 

Boudol  and  Castellani  define  program  P  to  be  secure,  which  we  denote 
BCNI(P),  iff  P  is  bisimilar  to  itself  in  all  initially  low-equivalent  memories: 

BCNI(P)  4  (Vp!,p2  :  p1=L/i2  =►  (P,/x2)). 

BCNI(P)  formalizes  Definition  3.8  (Secure  Programs)  from  [18].  The  hyper¬ 
property  containing  all  secure  programs  according  to  Boudol  and  Castellani's 
definition  is 

BCNI  =  {T  G  Prop  \T  eLTS  =>  (3  P  :  T  =  traces(P)  A  BCNI(P))}. 

BCNI  is  hyperliveness  because  of  the  existential  quantifier  on  s2  in  the  defi¬ 
nition  of  any  observation  that  contains  traces  leading  to  non-bisimilar  states 


166 


can  be  remedied  by  adding  additional  traces  leading  to  bisimilar  states.  This  re¬ 
mediation  corresponds  to  a  closure  operator  because  it  only  adds  traces,  thus 
BCNI  is  a  possibilistic  information-flow  policy 


5.4  State  Machines 

Goguen  and  Meseguer  [46]  define  a  state  machine  as  a  tuple  (S,  C,  O,  out,  do,  s0 ), 
where  S'  is  a  set  of  machine  states,  C  is  a  set  of  commands,  O  is  a  set  of  outputs, 
out  is  a  function  from  S  to  O  yielding  what  output  the  user  of  the  machine 
observes  when  the  machine  is  in  a  given  state,  do  is  a  function  from  S  x  C 
to  S  describing  how  the  machine  transitions  between  states  as  a  function  of 
commands,  and  s0  is  the  initial  state  of  the  machine.7  Such  state  machines  are 
deterministic  because  do  is  a  function  rather  than  a  relation. 

A  state  machine  M  =  ( S ,  C,  O,  out,  do,  s0)  can  be  encoded  as  a  set  of  traces. 
The  construction  proceeds  in  two  steps.  First,  M  is  encoded  as  a  labeled  transi¬ 
tion  system  (c.f.  §5.3)  by  treating  the  machine  commands  and  outputs  as  labels: 
Let  the  set  S  of  LTS-states  be  set  S  of  machine  states.  Let  the  set  L  of  labels  be 
product  set  C  x  O  of  commands  and  outputs.  Let  the  transition  relation  — >  in¬ 
clude  (s,  (c,  o),  s')  whenever  do(s,  c)  =  s'  and  out  (s')  =  o.  We  now  have  a  labeled 
transition  system  L  =  (S,  L,  — >).  Second,  the  traces  of  M  are  the  traces  of  L  that 
start  with  s0:  let  traces(M)  be  traces(S,  L,  — >)  n  {t  G  \Linf  |  t[ 0]  =  s0}. 

The  set  SM  of  all  state  machines  is  a  hyperproperty: 

SM  =  {T  e  Prop  I  (3  M  :T  =  traces(M))}.  (5.4.1) 

7Our  definition  of  state  machines  simplifies  Goguen  and  Meseguer 's  by  omitting  user  clear¬ 
ances,  though  the  clearances  still  appear  in  the  definition  of  GMNI. 
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Finally,  we  can  declare  that  GMNI  is  hypersafety  for  SM,  fulfilling  our  expec¬ 
tation  from  the  beginning  of  this  chapter. 

5.5  Probabilistic  Systems 

A  probabilistic  system  is  equipped  with  a  function  p  such  that  the  system  tran¬ 
sitions  from  a  state  s  to  state  s'  with  probability  p(s,s').8  This  probability  is 
Markovian  because  it  does  not  depend  upon  past  or  future  states  in  an  execution; 
nonetheless,  dependence  upon  the  past  or  future  can  be  modeled  by  allowing 
states  to  contain  history  or  prophecy  variables  [1].  Function  p  can  itself  even  be 
encoded  into  the  state  in  various  ways.  For  example,  state  s  could  record  p(s,  s') 
for  all  states  s’.  Or  in  a  trace  t,  state  t[i]  could  record  p(t[i\,t[i  +  1]).  This  lat¬ 
ter  encoding  is  an  instantiation  of  the  construction  in  §5.3  for  encoding  labeled 
transition  systems  as  sets  of  traces;  here,  the  labels  are  probabilities.  Either  way, 
probabilistic  systems  can  be  modeled  as  sets  of  traces.  Define  PR  to  be  the  set 
of  all  trace  sets  that  encode  probabilistic  systems — that  is,  trace  set  T  is  in  PR  if 
T  encodes  a  valid  probability  function  p(-,  •). 

To  obtain  a  probability  measure  on  sets  of  traces,  let  PrsS(T)  denote  the  prob¬ 
ability  with  which  set  T  of  finite  traces  is  produced  by  probabilistic  system  S 
beginning  in  initial  state  s.9  O'Neill  et  al.  [96]  show  how  to  construct  this  prob¬ 
ability  measure  from  p.  We  now  demonstrate  how  the  measure  can  be  used  in 
the  definitions  of  hyperproperties. 

8To  be  a  valid  probability,  p(s,  s')  must  be  in  the  real  interval  [0,1]  for  all  s  and  s';  and  for  all 
s,  it  most  hold  that  J2S’  P(s>  s')  =  T 

9The  initial  state  can  be  eliminated  if  we  also  assume  a  prior  probability  on  initial  states  [52, 
§6.5].  The  requirement  that  the  traces  in  T  be  finite  is,  however,  essential  to  ensure  that  PrSis(T) 
is  a  valid  probability  measure. 
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Probabilistic  noninterference.  In  information-flow  security,  the  original  mo¬ 
tivation  for  adding  probability  to  system  models  was  to  address  covert  chan¬ 
nels  and  to  establish  connections  between  information  theory  and  information 
flow  [48,49,88].  Probabilistic  noninterference  [49]  emerged  from  this  line  of  re¬ 
search.  Intuitively,  this  policy  requires  that  the  probability  of  every  low  trace  be 
the  same  for  every  low-equivalent  initial  state.  To  formulate  probabilistic  non¬ 
interference  as  a  hyperproperty,  we  need  some  notation.  Let  the  low  equivalence 
class  of  a  finite  trace  t  be  denoted  [/]/,,  where 

[t\L  =  {t1  E  d/fin  I  evL(t)  =  evL(t')}. 

The  probability  that  system  S,  starting  in  state  s,  produces  a  trace  that  is  low- 
equivalent  to  t  is  therefore  PrSjs([f]L).  Let  the  set  of  initial  states  of  trace  property 
T  be  denoted  Init(T),  where 

ImtiT )  =  {s  |  {s}  <  T}. 

Probabilistic  noninterference  can  now  be  expressed  as  follows: 

PNI  =  {T  G  Prop  \  TePR 

A  (Vsi,s2  £  Init(T )  :  evL(s i)  =  evL{s2) 

=>  (Vf  G  Tfin  :  PrSl;T([f]L)  =  p L2,t([£]l)))}.  (5.5.1) 

PNI  is  not  hyperliveness  for  PR,  because  a  system  that  deterministically 
produces  two  non-low-equivalent  traces  from  two  initial  low-equivalent  states 
cannot  be  extended  to  satisfy  PNI.  Whether  PNI  is  hypersafety  for  PR  depends 
on  whether  state  space  E  is  finite.  To  see  why,  consider  a  system  T  such  that 
T  f  PNI  and  T  e  PR .  We  can  attempt  to  construct  a  bad  thing  M  for  T  as 
follows.  Since  T  f  PNI,  there  exists  a  trace  L.  of  low  events  that  is  produced  by 
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initial  states  si  and  s2  with  differing  probabilities.  Let  M  be  the  prefix  of  T  that 
completely  determines  the  probability  of  tL  for  those  initial  states: 

M  =  {t  £  d/fin  I  f[0]  G  {si,s2}  a  t  <  T  A  evL(t)  =  tL}. 

Recall  that  bad  things  must  be  finitely  observable  and  irremediable.  M  is  irre¬ 
mediable  because  no  extension  of  it  can  change  the  probability  of  tL  for  initial 
states  Si  and  s2.  But  is  M  finitely  observable — that  is,  is  M  £  Obs?  Recall  that  an 
element  of  Obs  must  be  a  finite  set  of  finite  traces.  Each  trace  in  M  is  finite,  but 
M  might  not  be  a  finite  set: 

•  If  state  space  E  is  countably  infinite,10  there  could  be  infinitely  many  states 
to  which  Si  (and  s2)  transition.  Hence  there  could  need  to  be  infinitely 
many  traces  in  M  to  completely  determine  the  probability  of  so  M 
could  not  be  in  Obs.  Moreover,  any  finite  subset  N  of  M  would  necessarily 
omit  some  states  from  E.  So  it  might  be  possible  to  extend  N  to  a  system  T' 
that  satisfies  PNI  by  adding  traces  containing  those  omitted  states.  Thus 
T  would  have  no  bad  thing,  and  PNI  would  not  be  hypersafety  for  PR . 

•  If  E  is  finite,  only  finitely  many  finite  traces  are  low-equivalent  to  tL.  Thus 
M  is  finite,  and  no  extension  of  V  of  M  can  change  the  probability  of 
So  T'  cannot  be  in  PNI.  Therefore  PNI  is  hypersafety  for  PR . 

Gray's  definition  of  probabilistic  noninterference  [49]  is  hypersafety  for  PR,  be¬ 
cause  Gray  required  the  state  (and  input  and  output)  space  to  be  finite.  But  the 
definition  of  O'Neill  et  al.  [96]  is  neither  hypersafety  nor  hyperliveness,  because 
it  allowed  a  countably  infinite  state  space. 

10State  space  E  cannot  be  uncountably  infinite  without  generalizing  probability  function  />(•,  •) 
to  a  probability  measure. 
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Secure  encryption.  A  private-key  encryption  scheme  is  a  tuple  (M,  1C,  C,  Gen, 
Enc,  Dec),  where  M  is  the  message  space,  JC  is  the  key  space,  and  C  is  the  ciphertext 
space  such  that  the  following  hold: 

•  Gen  is  the  key-generation  algorithm,  a  randomized  algorithm  that  produces 
a  key  k  £  JC.  We  write  k  <—  Gen  to  denote  the  sampling  of  k  from  the 
probability  distribution  induced  by  Gen. 

•  Enc  is  the  encryption  algorithm,  an  algorithm  (either  randomized  or  deter¬ 
ministic)  that  accepts  a  key  k  £  1C,  a  plaintext  message  m  £  M.,  and  yields 
a  ciphertext  c  £  C  that  is  the  encryption  of  m  using  k.  We  denote  this  as 

c  =  Enc(m,  k). 

•  Dec  is  the  decryption  algorithm,  a  deterministic  algorithm  that  accepts  a  key 
k  £  JC,  a  ciphertext  c  e  C,  and  yields  a  plaintext  m  that  is  the  decryption  of 
c  using  k.  We  denote  this  as  m  =  Dec(c,  k). 

•  Decryption  is  the  inverse  of  encryption.  Formally,  for  all  m  £  M  and 

k  £  JC,  it  holds  that  Pr  ( Dec(Enc(m ,  k),k)  =  m)  =  1. 

A  private-key  encryption  scheme  satisfies  perfect  indistinguishability  [61]  if  the 
probability  distribution  on  ciphertexts  is  the  same  for  all  plaintexts.  Formally, 
for  all  mi,  m2,  and  c, 

Pr  (k  <—  Gen  :  Enc(mi ,  k)  =  c)  =  Pr  (k  Gen  :  Enc(m2,  k)  =  c) . 

Perfect  indistinguishability  can  be  formulated  as  a  hyperproperty  on  prob¬ 
abilistic  systems.  To  encode  encryption  scheme  (M,  JC,  C,  Gen,  Enc,  Dec)  as  a 
probabilistic  system,  let  the  set  of  states  of  the  system  be 

M  U  1C  U  C  U  {Gen}  U  {Enc(m,  k)  \  k  £  JC,  m  £  A4} 

U  {Dec(c,  k)  \  k  £  JC,c  £  C}. 
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Let  probability  function  p(-,  •)  be  defined  such  that 


•  p(Gen,k)  =  Pr  (k  —  Gen), 

•  p(Enc(m,k),c)  =  Pr  (c  =  Enc(m,  k)),  and 

•  p(Dec(c,  k),m)  =  1  iff  Dec(c,  k )  =  m. 

Let  the  system  so  constructed  from  (M. ,  1C,  C,  Gen,  Eric,  Dec)  be  denoted 

encSys(A4 , 1C,  C,  Gen,  Enc,  Dec), 

and  let  the  set  of  all  such  systems  be  ES .  The  following  hyperproperty  expresses 
perfect  indistinguishability: 

PI  =  {T  G  Prop  |  T  e  ES 

A  (3  A4, 1C,  C,  Gen,  Enc,  Dec  : 

T  =  encSys(JG[ ,  1C,  C,  Gen,  Enc,  Dec) 

A  (V  mi ,  m2  €  A4 ;  c  e  C  : 

Pr  ( Enc(mi )  =  c) 

=  Pr  (Enc(m2)  =  c)))},  (5.5.2) 

where  Pr  ( Enc(m )  =  c)  denotes 

y]  Praen,T({Gen,k})  ■  Pr Enc{m,k),T{{Enc(m,k),c}). 

keK. 

PI  is  hypersafety  for  ES  because  any  encryption  scheme  that  is  not  in  PI 
has  a  ciphertext  c  and  two  messages  mi,  m2  such  that  the  probability  that  mi 
encrypts  to  c  is  not  equal  to  the  probability  that  m2  encrypts  to  c.  Trace  set 
{Enc(m,  k),c\  k  G  fC,  m  G  {mi,  m2}}  thus  is  irremediable,  and  it  is  finite  assum¬ 
ing  that  key  space  1C  is  finite.  So  the  trace  set  is  a  bad  thing.  But  note  that  PI  is 
not  subset  closed  for  Prop,  so  stepwise  refinement  is  not  applicable  with  PI. 
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Other  definitions  of  secure  encryption,  such  as  computational  indistin- 
guishability  in  various  attacker  models  (including  IND-CPA  and  IND-CCA), 
can  similarly  be  formulated  as  hyperproperties. 

Quantification  of  information  flow.  Probability  can  also  be  used  to  reason 
about  the  amount  of  information  that  a  system  can  leak.  For  example,  chan¬ 
nel  capacity  is  the  maximum  rate  at  which  information  can  be  reliably  sent  over 
a  channel  [106];  Gray  [49]  formulates  as  a  channel  the  leakage  of  secret  infor¬ 
mation  from  a  system,  and  he  quantifies  the  capacity  of  that  channel.  The  hy¬ 
perproperty  "The  channel  capacity  is  k  bits"  (denoted  CCj  )  is  hyperliveness  for 
PR,  since  no  matter  what  the  rate  is  for  some  finite  prefix  of  the  system,  the  rate 
can  changed  to  any  arbitrary  amount  by  an  appropriate  extension  that  conveys 
more  or  less  information. 

Chapter  2  gives  a  model  and  metric  for  quantifying  the  leakage  over  a  series 
of  experiments  on  a  program  S.  The  policy  specifying  that  the  leakage  is  less 
than  k  bits  for  all  experiments,  denoted  QLk,  is  hypersafety  for  a  variant  of  PR, 
as  we  now  show. 

Recall  that  a  state  of  a  probabilistic  program  has  an  immutable  high  pro¬ 
jection  and  a  mutable  low  projection,  that  a  repeated  experiment  on  probabilistic 
program  S'  is  a  finite  sequence  of  executions  of  S',  and  that  each  individual  ex¬ 
ecution  is  an  experiment.  An  experiment  can  be  represented  with  two  states:  an 
initial  state,  in  which  inputs  are  provided  to  the  program,  and  a  final  state,  in 
which  outputs  are  given  by  the  program.  All  initial  states  (across  all  executions) 
in  a  repeated  experiment  must  have  the  same  high  projection  but  may  have  dif¬ 
ferent  low  projections.  Recall  that  the  probabilistic  behavior  of  S  is  modeled  by 
a  semantics  [S]  that  maps  inputs  states  to  output  distributions,  where  ([*S']s)(s/) 
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is  the  probability  that  S  begun  in  state  s  terminates  in  state  s'.  An  attacker  be¬ 
gins  an  experiment  with  a  prebelief  about  the  high  projection  of  the  initial  state. 
After  observing  the  output  of  the  execution,  the  attacker  updates  his  prebelief 
to  produce  a  postbelief  about  the  high  projection  of  the  initial  state. 

We  here  use  traces  and  events  to  represent  repeated  experiments,  where  each 
state  in  a  trace  produces  an  event.  The  events  alternate  between  input  and  out¬ 
put,  and  the  first  event  in  a  trace  must  be  an  input.  Each  output  must  have  the 
correct  probability  of  occurring  according  to  [5]  and  the  most  recent  input.11 
Each  low  input  projection  may  vary,  but  the  high  projection  must  be  the  same 
in  each  input.  Let  Syst(S)  denote  the  system  of  such  traces  resulting  from  pro¬ 
gram  S: 

Syst(S)  =  {t  G  Tfin  |  (Vi  :  0  <  2i  +  1  <  \t\ 

==>■  evmn{t[2i\)  =  evmn(t[  0]) 

A  p(t[2i\,t[2i  +  1])  =  ({Sjt[ 2i])(t[2i  +  1]))}, 


where  \t\  denotes  the  length  of  finite  trace  t,  and  p(-,  •)  is  the  probability  function 
used  in  §5.5.  From  Syst(S)  we  can  construct  probability  measure  Pr s,Syst(S),  also 
used  in  §5. 5. 12 

Each  pair  of  states  t[i]  and  t[i  +  1],  for  even  i,  in  repeated  experiment  t  yields 
an  experiment.  An  experiment  is  described  formally  by  a  prebelief,  a  high  input, 
a  low  input,  a  low  output,  and  a  postbelief. 

11 A  representation  in  which  each  finite  trace  contains  two  states  (initial  and  final)  might  at 
first  seem  suitable  for  repeated  experiments.  That  representation  would  fail  to  preserve  the 
order  in  which  inputs  are  provided  (in  initial  states)  across  the  sequence  of  executions  in  the 
repeated  experiment.  However,  a  single  trace  with  many  states  does  capture  this  order. 

12Note  that  p(s,  s')  is  defined  only  at  every  other  state  in  each  trace  of  Syst(S),  so  to  construct 
the  measure  we  treat  each  pair  of  states  in  the  trace  a  single  state.  Also  note  that  the  set  of 
program  states  must  be  finite  for  the  probability  measure  to  be  well-defined. 
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As  part  of  determining  the  postbelief  for  an  experiment,  the  attacker's  pre¬ 
diction  5a  of  the  low  output  is  calculated  from  prebelief  bn  and  low  input  l : 

bA{bHJ)  =  .bH(evHin(s))  ■  Prr,sySt(s)({rs}), 

where  r  is  the  state  that  has  ev mn(s)  as  its  high  projection  and  /  as  its  low  projec¬ 
tion.  Denote  the  ith  experiment  in  trace  t,  with  initial  prebelief  bH/  as  E(t,  i,  bH). 
We  define  S(t,i,bH )  using  OCaml-style  record  syntax: 

E(t,  i,  bn)  —  {  preBelief  —  if  i  >  0  then  E(t,i  —  1 ) .postBelief  else  bn] 
highln  =  evHm(t[  2i]); 
lowln  =  evL(t[2i])‘, 
lowOut  =  evi{t[2i  +  1]); 
postBelief  =  ( 5a(Ph,1 )  |  lowOut)  \H  }, 

where  |  is  the  distribution  conditioning  operator,  and  |  is  the  distribution  pro¬ 
jection  operator,  defined  in  §2.1. 

The  quantity  of  flow  in  experiment  S(t,i,bH )  is  denoted  Q(E(t,i,bH ))  and 
defined  in  §2.3.1.  The  quantity  of  flow  over  repeated  experiment  t  with  initial 
prebelief  bH,  denoted  Qit,  bH),  is  the  sum  of  the  flow  for  each  experiment  in  t: 

(1*1  — 1)/2 

Q(f,M  = 

t=0 

Hyperproperty  QL  k  is  the  set  of  all  systems  that  exhibit  at  most  k  bits  of  flow 
over  any  experiment: 

QLk  4  {T  e  Prop  \  (3S  :T  =  Syst{S)  =►  {VteT,bH:  Q{bH,t)  <  k))}. 
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5.6  Results  on  Generalized  Hypersafety  and  Hyperliveness 


The  results  proved  in  chapter  4  about  hypersafety  and  hyperliveness  generalize 
naturally  to  specific  system  representations.13  Informally,  the  generalizations 
are  as  follows: 

•  If  P  is  safety  (liveness)  for  Rep,  then  [P]  is  hypersafety  (hyperliveness)  for 
Rep  (generalizing  propositions  4.1  and  4.2). 

•  If  P  is  hypersafety  for  Rep,  then  P  is  subset  closed  for  Rep,  but  not  nec¬ 
essarily  subset  closed  for  Prop  (generalizing  theorem  4.1).  Consequently, 
stepwise  refinement  does  not  necessarily  work  with  hyperproperties  that 
are  hypersafety  for  Rep. 

•  If  P  is  a  possibilistic  information-flow  policy  for  Rep,  then  P  is  hyperlive¬ 
ness  for  Rep  (generalizing  theorem  4.3). 

•  /c-hypersafety  for  Rep  can  be  reduced  to  safety  for  Rep/l  (generalizing  the¬ 
orem  4.2). 

•  Every  hyperproperty  for  Rep  is  the  intersection  of  a  safety  hyperproperty 
for  Rep  with  a  liveness  hyperproperty  for  Rep  (generalizing  theorem  4.5). 

We  give  the  formal  statements  of  these  generalized  results  below.  The  proofs  of 
these  results  are  all  straightforward  corollaries  of  the  original  results,  although 
some  proofs  require  additional  assumptions  about  Rep. 

First,  we  must  define  safety  and  liveness  for  system  representations.  Let 
Tr(Rep)  denote  the  set  of  all  traces  that  are  contained  in  any  system  in  Rep — that 
is,  Tr(Rep)  =  [JTeRepT.  Let  Obs(Tr( Rep))  denote  the  set  of  all  finite  traces  that 
are  prefixes  of  some  trace  in  Tr(Rep) — that  is,  Obs(Tr( Rep))  =  {t  G  Tfin  |  (3  if  G 

13We  do  not  generalize  the  topological  results  here.  However,  since  the  intersection  theorem 
generalizes,  we  believe  that  the  topological  results  could  also  be  generalized. 
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Tr{ Rep)  :  t  <  t')}.  Let  the  lift  [P] Rep  of  property  P  in  Rep  be  V(P)  D  Rep.  A  trace 
property  S'  is  a  safety  property  for  system  representation  Rep  iff 

(Vf  G  Pr(Rep)  :  t  f  S  =>•  (3  m  G  Obs{  Tr(Rep))  :  m  <  t  A 

(Vf'  G  Tr(Rep)  :  m  <  t'  =►  f'  £  S'))). 

A  trace  property  L  is  a  liveness  property  for  system  representation  Rep  iff 

(Vf  G  0&s( Pr(Rep))  :  (3 1'  G  Tr(Rep)  :  t  <t'  A  t'  G  L)). 

Note  that,  compared  to  the  original  definitions  of  safety  and  liveness  in  chap¬ 
ter  4,  we  have  simply  replaced  \hinf  with  Tr(Rep),  and  d/fin  with  Obs( Tr(Rep)). 
Let  SP(Rep)  be  the  set  of  all  safety  properties  for  Rep,  and  let  LP(Rep)  be  the  set 
of  all  liveness  properties  for  Rep.  Likewise,  let  SHP(Rep)  be  the  set  of  all  safety 
hyperproperties  for  Rep,  and  let  LHP  (Rep)  be  the  set  of  all  liveness  hyperprop¬ 
erties  for  Rep. 

Generalization  of  proposition  4.1.  If  (Vf  G  Tr(Rep)  :  {f }  G  Rep),  then 

(VS  G  P(Rep)  :  S  G  SP(Rep)  «  [S]Rep  G  SHP(Rep)). 

The  forward  direction  of  this  generalization  always  holds,  but  the  backward 
direction  (<*=)  might  not  hold  if  Rep  does  not  allow  individual  traces  from 
Tr(Rep)  to  be  representations:  the  bad  thing  for  a  safety  hyperproperty  could 
never  be  an  individual  trace,  hence  the  safety  hyperproperty  could  not  be  the 
lift  of  a  safety  property.  So  the  backward  direction  requires  the  assumption  that 
any  individual  trace  in  Tr( Rep)  is  itself  a  system  representation  in  Rep — that  is, 
(Vf  G  Tr(Rep)  :  {f}  G  Rep).  Note  that  Prop  satisfies  this  assumption. 

Generalization  of  proposition  4.2.  If  (VTC  Tr(Rep)  :  T  G  Rep),  then 

{ML  G  V(Rep)  :  L  G  LP(Rep)  «  [L\ Rep  G  LHP(Rep)). 
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The  backward  direction  of  this  generalization  always  holds,  but  the  forward 
direction  (=>)  might  not  hold  if  Rep  does  not  allow  arbitrary  unions  of  indi¬ 
vidual  traces  from  Tr( Rep)  to  be  representations:  the  union  of  the  individual 
good  things  for  a  liveness  property  would  not  necessarily  be  good  for  the  lift 
of  that  liveness  property  So  the  forward  direction  requires  the  assumption  that 
arbitrary  unions  of  individual  traces  in  Tr( Rep)  are  themselves  system  repre¬ 
sentations  in  Rep — that  is,  (VT  C  Tr(Rep)  :  T  e  Rep).  Note  that  Prop  satisfies 
this  assumption. 

Generalization  of  theorem  4.1.  If  (3Le  LP(Rep)  :  L  ^  Tr(Rep)),  then 

SHP(Rep)  c  SSC(Rep). 

SSC(Rep)  is  the  set  of  all  hyperproperties  for  Rep  that  are  subset  closed  on  Rep: 

P  E  SSC(Rep)  <*=*  (VT  6  P  :  (VT'  e  Rep  :  T  C  T  ==►  T  G  P)). 

The  strictness  of  the  subset  in  the  theorem  generalization  requires  the  assump¬ 
tion  that  there  exist  subset-closed  hyperproperties  that  are  not  safety.  But  it 
suffices  to  instead  assume  that  hyperliveness  is  not  trivial  for  Rep — that  is, 
(3T  e  LP(Rep)  :  L  ^  Tr(Rep)).  Note  that  Prop  satisfies  both  assumptions. 

Generalization  of  theorem  4.2. 

(VS  E  Rep  K  e  KSHP(fc)(Rep)  :  (3 K  e  SP(Rep)  :  S  \=  K  <?=>  Sk  \=  K)). 

KSHP(/r)(Rep)  is  the  subset  of  SHP(Rep)  where  the  size  of  bad  thing  M  is 
bounded  by  k. 
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Generalization  of  theorem  4.3.  If  there  exists  some  liveness  hyperproperty  for 
Rep  that  is  not  a  possibilistic  information-flow  policy  for  Rep,  then 

PIF(Rep)  C  LHP(Rep). 

PIF(Rep)  is  the  set  of  all  possibilistic  information-flow  policies  expressed  by  clo¬ 
sure  operators  Cl  of  type  Rep  — >  Rep.  The  strictness  of  the  subset  requires  the 
assumption  of  the  existence  of  a  liveness  hyperproperty  for  Rep  that  is  not  a 
possibilistic  information-flow  policy  for  Rep.  Note  that  Prop  satisfies  this  as¬ 
sumption. 

Generalization  of  theorem  4.5. 

(VP  G  P(Rep)  :  (3  S  G  SHP(Rep),  L  G  LHP(Rep)  :  P  =  S  D  I)). 

The  proof  of  this  generalization  requires  the  following  generalized  definition: 

Safe{P )  =  (T  G  Rep  |  (V M  G  06s  (Rep)  :  M  <T 

==►  (3 T'  G  Rep  :  M  <  T  A  T  G  P))}. 

Also,  in  the  definition  of  Live(P),  notation  H  must  now  denote  the  complement 
of  hyperproperty  H  with  respect  to  Rep. 


5.7  Summary 

This  chapter  has  classified  several  security  policies  with  hypersafety  and  hy¬ 
perliveness  for  particular  system  representations.  Figure  5.1  summarizes  this 
classification. 

We  have  shown  that  the  theory  of  hyperproperties  can  be  generalized  to  ap¬ 
ply  to  system  representations  such  as  relational  semantics,  labeled  transition 


179 


HP 


TSRNI 


Figure  5.1:  Classification  of  security  policies  for  system  representations 

systems,  state  machines,  and  probabilistic  systems.  In  each  case,  we  encode  the 
system  representation  into  trace  sets,  thus  into  hyperproperties.  All  of  our  theo¬ 
rems  about  hyperproperties  continue  to  hold  for  system  representations,  though 
some  additional  assumptions  about  the  system  representation  are  needed. 
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CHAPTER  6 


CONCLUSION 

In  practice,  computer  security  policies  are  often  expressed  as  informal  re¬ 
quirements  in  natural  languages  (e.g.,  English),  which  are  inherently  ambigu¬ 
ous.  But  security  policies  can  also  be  expressed  precisely  with  mathematical 
models  and  notations,  and  this  precision  makes  policies  amenable  to  analysis 
both  by  humans  and  computers. 

This  dissertation  has  developed  such  mathematical  foundations.  Informa¬ 
tion  theory  was  used  in  chapters  2  and  3  to  quantify  information-flow  security. 
This  quantification  is  useful  for  analyzing  the  security  of  systems  whose  proper 
operation  requires  leakage  of  information,  such  as  password  checkers  and  sta¬ 
tistical  databases.  We  showed  that  accuracy  of  belief  can  be  used  to  quantify 
information  flow  for  both  confidentiality  and  integrity,  and  that  accuracy  gen¬ 
eralizes  previous  metrics  based  on  uncertainty.  Hyperproperties  were  used  in 
chapters  4  and  5  to  formalize  security  policies.  This  formalization  is  the  first  to 
enable  expression  of  all  kinds  of  security  requirements  in  a  uniform  framework. 
We  showed  that  the  theory  of  trace  properties  generalizes  to  hyperproperties. 

The  historical  background  in  §1.1  began  with  the  taxonomy  of  confidential¬ 
ity,  integrity,  and  availability.  More  research  is  needed  on  the  relationship  be¬ 
tween  this  taxonomy  and  the  formalisms  we  have  studied.  For  quantitative 
flow,  we  have  given  definitions  for  confidentiality  and  integrity,  but  availability 
remains  unexplored.  For  hyperproperties,  the  relationship  with  the  taxonomy 
is  an  open  question,  but  we  can  offer  some  observations: 

•  Information-flow  confidentiality  is  not  a  trace  property,  but  it  is  a  hyper¬ 
property,  and  it  can  be  hypersafety  (e.g.,  observational  determinism)  or 
hyperliveness  (e.g.,  generalized  noninterference). 
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•  Integrity,  as  the  information-flow  dual  of  confidentiality,  includes  exam¬ 
ples  from  both  hypersafety  and  hyperliveness.  And  when  stipulating  ac¬ 
cess  control  on  changes  to  data  and  other  resources,  integrity  is  safety 

•  Availability  is  sometimes  hypersafety  (maximum  response  time  in  any 
execution,  which  is  also  safety)  and  sometimes  hyperliveness  (mean  re¬ 
sponse  time  over  all  executions). 

The  classification  of  security  requirements  as  confidentiality,  integrity,  and 
availability  therefore  would  seem  to  be  orthogonal  to  hypersafety  and  hyper¬ 
liveness. 

More  research  is  also  needed  on  how  to  obtain  assurance  that  real  systems 
meet  the  security  definitions  we  have  given.  For  quantitative  flow,  one  impor¬ 
tant  open  question  is  how  to  make  our  theoretical  policies  practical  in  real  sys¬ 
tems,  either  by  enforcing  a  limit  on  information  flow  or  by  measuring  the  actual 
amount  of  information  flow.  For  hyperproperties,  we  gave  a  relatively  complete 
verification  methodology  for  A'-hy  per  safety  properties,  but  whether  there  is  a 
relatively  complete  verification  methodology  for  all  hyperproperties  remains 
an  important  open  question. 

The  immediate  goal  of  the  research  presented  in  this  dissertation  is  to  im¬ 
prove  our  understanding  of  the  foundations  of  computer  security  so  that  we 
can  specify  system  security  requirements  and  gain  assurance  that  systems  meet 
those  requirements.  But  the  ultimate  goal  is  to  ameliorate  the  real-world  con¬ 
sequences  of  security  vulnerabilities.  These  vulnerabilities  were  a  motivation 
for  the  1991  report  by  the  System  Security  Study  Committee  of  the  National 
Research  Council: 

"Computer  systems  are  coming  of  age.  As  [they]  become  more 
prevalent,  sophisticated,  . . .  and  interconnected,  society  becomes 
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more  vulnerable  to  poor  system  design,  accidents. . . ,  and  attacks. 
Without  more  responsible  design  and  use,  system  disruptions  will 
increase,  with  harmful  consequences  for  society."  [92,  Executive 
Summary] 

Now,  almost  two  decades  later,  it  seems  clear  not  only  that  the  Committee 
was  right,  but  that  the  potential  for  disruptions  and  the  severity  of  their  con¬ 
sequences  continues  to  increase.  It  is  my  hope  that  the  research  presented  in 
this  dissertation  will  in  some  way  help  to  reduce  the  economic,  defense,  and 
social  consequences  of  security  vulnerabilities. 
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Symbols 

—l  (low  equivalence),  19, 116 
[■]l  (low  equivalence  class),  169 
[•]  (semantics),  46 

(low  equivalence),  116 
(8>  (product),  20 
|  (completion),  135 

<  (prefix),  137 
[•]  (Hft),  114 

<  (prefix),  123 

(  (projection),  19,  87 
(=  (satisfaction),  112, 113 
x  (parallel  self-composition),  127 
\I7  (traces).  111 
*  (lift),  47 

f[(. .)*(..)]  (indexing),  112 
|  (belief  update,  conditioning),  20 
|  (distribution  conditioning),  25 
;  (self-composition),  125 

1- safety,  126 

2- liveness,  162 
2-safety,  126 

A 

abstraction  function,  120 


AC,  113 

access  control,  3, 4, 112, 122, 182 
accuracy,  7,  8, 15,  30,  34 
vs.  uncertainty,  34 
action,  164 

admissibility  restriction,  21,  37,  56,  58, 
88 

agent,  16,  22 
anonymizer,  99 
assets,  see  information 
assurance,  182 
attacker,  6,  22, 49 
attenuation,  103 
audit,  1 

authorization  policy,  3 
availability,  1-3, 181 

B 

B,  see  belief  revision 
bad  thing,  11, 122, 170 
bandwidth,  5 
base,  134 

Bayesian  inference,  8,  27,  58 
BCNI,  166 
behavior,  120, 158 
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belief,  7,  8, 16, 19-21,  23,  87 


distance,  20,  30 
product,  19 
update,  20 

belief  revision,  27,  29,  30 
Bell-LaPadula  model,  4, 104 
Biba  model,  86, 104 
binary  symmetric  channel,  98 
bisimilar,  166 
bisimulation,  117, 163 
BNDC,  165 

c 

C  (closed  sets),  135 
Cartesian,  46 
category  set,  4 
CCk,  173 

Cesaro  means,  118 
channel,  5,  84,  94 

capacity,  55,  58, 104, 173 
covert,  5, 169 
termination,  161 
ciphertext,  171 
Cl,  131 

Clark-Wilson  model,  104 
Clc,  137 

clearance,  4, 115, 158, 167 


closed  set,  14, 134 
closure  operator,  131, 167, 179 
code-word,  97 
command,  167 
Common  Criteria,  3 
complete  partial  order,  47 
completion,  134, 135 
concatenation,  112 
concurrency,  165 

confidentiality,  1-3,  9, 101, 115, 181 
contamination,  83, 102 
critical  section,  120 
CSP,  139 

D 

D  (distance),  21 
T>  (dense  sets),  135 
data,  see  information 
denotational  semantics,  17,  46,  52 
dense  set,  14, 134 
deterministic 

program,  36,  45, 104 
system,  115, 116, 159, 167 
Dist,  17 

distribution,  16,  20,  41,  46,  87 
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maximum  entropy,  21 
uniform,  21,  33,  43,  55, 104 


unnormalized,  49 
distribution  update,  47 


divergence,  161 
dual,  83, 102, 104, 182 
dynamic  analysis,  104 
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encryption,  171 

entropy,  6,  7,  33,  55,  56,  see  relative  en¬ 
tropy 

conditional,  56 
error-correcting  code,  84,  97 
ES,  172 
event,  115, 164 
execution.  111,  112, 133, 158 
experiment,  22, 173 
confidentiality,  22 
contamination,  87 
number,  45 
protocol,  23,  28,  49 
insider,  52 
repeated,  43 
suppression,  91 
extension,  11, 134 
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false,  130 

file-system  permissions,  3 


fixed  point,  47 
frequency,  16 

frequency  distribution,  see  distribution 
function.  111 

G 

generalized  noninterference,  54,  116, 
124, 130, 131, 181 
GMNI,  115, 158 
GNI,  116 

good  thing,  12, 128 
GS,  113 

guaranteed  service,  113, 128 
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H  (high),  18, 115 
Hamming  distance,  98 
hiding,  103 
high  information,  101 
history  variable,  51, 168 
HP,  113 
humans,  1 
hygiene,  103 

hyperliveness,  13, 128-133, 136, 139 
for  Rep,  160, 176 
hyperproperty,  13, 110, 113 
expressivity,  114, 118 
intersection  theorem,  140 
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hypersafety,  13, 122-124, 136, 139 
for  Rep,  160, 176 
hypothesis,  27 

I 

indistinguishability,  171, 173 
information,  2,  9,  31,  90,  93 
information  flow,  4,  see  quantity  of 
flow 

as  hyperproperty,  114-117 
correlation,  117 
not  trace  property,  12 
possibilistic,  130, 165, 167, 176, 179 
reduction  to  safety  property,  127 
verification,  125 

information  theory,  6,  9, 10,  84,  94,  98 
Insider,  50 
insider,  9,  49 
insider  choice,  49 
insider  function,  50,  52,  59 
integrity,  1-3,  9,  83, 101, 181 
interpretation  function,  120 
invariance,  11, 13, 125, 127 
irremediable,  122, 170 

K 

k- safety,  13, 125-128 

KSHP,  126 


L 

L  (low),  18, 115 
label 

confidentiality,  18,  25 
integrity,  85 

labeled  transition  systems,  14,  111, 
163-167 

probabilistic,  57 
lattice,  4 
leakage 

one  bit,  34 
LHP,  129 
lift,  114 

liveness  hyperproperty,  see  hyperlive¬ 
ness 

liveness  property,  12, 129 
logic,  119, 125, 131, 141 
low  information,  101 
low  projection,  19 
LP,  129 
ITS,  163 

M 

Markovian,  168 
mass,  17 

mean  response  time,  12,  110,  117,  129, 
182 
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medical  information  system,  139 
memory,  165 
message,  171 
misinformation,  9,  36 
multilevel  security,  4 
mutable  inputs,  28, 38,  88 
mutual  information,  6,  7,  57,  94 
conditional,  37,  42,  56 

N 

National  Research  Council,  182 
nearest-neighbor  decoding,  98 
NNT,  131 
noise,  84,  85 
nondeducibility,  130 
nondeterminism,  121, 137, 138, 159 
nondeterministic  choice,  9,  59,  60 
nondeterministic  system,  13,  53, 
116, 130 

noninterference,  15,  53, 110, 158 
bisimulation,  163 
Goguen  and  Meseguer,  4, 115 
not  trace  property,  12 
probabilistic,  58,  59, 169 
reduction  to  safety  property,  127 
relational,  161 
nontermination,  29 


NRW ,  112 

o 

O  (open  sets),  135 
Ob,  135 
object,  3, 113 
Obs,  123 
06s(Rep),  160 
ObsDet,  53 

observable,  122, 130, 170 
hyperproperty,  138 
property,  133 
observation,  24,  89 

Bayesian  inference,  27 
of  nontermination,  29 
of  system,  123, 135, 160 
observational  determinism,  9, 130, 139, 
181 

as  hyperproperty,  116 
equivalent  to  zero  flow,  53 
is  hypersafety,  124 
not  2-safety  property,  127 
subset  closed,  122 
observer,  133, 161, 164, 166 
OD,  116 
open  set,  134 

operational  semantics,  160 
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optimal  code,  21 

Orange  Book,  see  Trusted  Computer 
System  Evaluation  Criteria 
Osb,  135 
outcome,  30 

P 

paper,  83 
password,  23 
password  checker,  6, 181 
expected  flow,  41 
experiment  on,  25 
faulty,  39 

maximum  flow,  43 
PWC,  6,  25 

quantity  of  flow,  32-33 
Pci,  131 
Petri  net,  12 
PI,  172 
PIF,  131 
PNI,  169 
point  mass,  17 
possible  world,  19 
postbelief,  23, 174 
powerdomain,  137 
PR,  168 
Pr,  168, 174 


prebelief,  23,  88, 174 
predicate.  111 
prediction,  24,  88, 175 
probabilistic  choice 

and  insider  choice,  50 
command,  18,  48,  60,  98 
made  by  program,  31,  90,  93 
misinformation,  36 

probabilistic  systems,  14,  111,  130, 168- 
175 

probability  distribution,  8,  see  distribu¬ 
tion 

probability  measure,  20,  47,  48,  168, 
170, 174 

program  term,  165 
programmers,  120 
Prop,  112, 159 

property,  11,  see  hyperproperty,  see  sys¬ 
tem  property,  see  trace  property 
prophecy  variable,  51, 168 
public,  18 

Q 

Q  (quantity  of  flow),  31 
Qcon,  89 

Qe,  39 
QLk,  175 
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Qmax,  43 

Qtransr  92 

quantification,  17 
quantity  of  flow,  6, 181 

as  improvement  in  accuracy,  31 
confidentiality  and  integrity,  102 
contamination,  89-90 
expected,  39, 41 

integrity  vs.  confidentiality,  100 
maximum,  43 
suppression,  92-97 
query,  84,  99 

R 

reality,  34, 45 
receiver,  84 

refinement,  13, 120, 124, 172, 176 
mapping,  120 
paradox,  54, 121 
Rel,  161 

relational  systems,  14, 160-163 
relative  entropy,  8,  21 

in  database  privacy,  58, 104 
Rep,  159 

repetition  code,  97 
request,  113, 117, 128 
resource,  see  information 


response,  84, 100, 113, 117, 128 
right,  3, 113 

s 

S  (suppression),  92 
E  (states).  111,  170 
5a,  95 

safety  hyperproperty,  see  hypersafety 
safety  property,  11, 123, 126 
SAG,  132 
scheduler,  130 
secret,  18 

secret  sharing,  126, 128, 139 
SecS, 128 

security  domains,  110 
security  level,  4 
security  policies,  1,  83, 110, 158 
are  not  trace  properties,  12 
classification,  140, 180 
selective  interleaving  function,  131 
self-composition,  125 
semaphore,  120 
sender,  84 
sequence.  111 

service  level  agreement,  117 
SHP,  124 
SM,  115, 168 
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society,  182 
SP,  97 
SP,  123 
Spa,  97 

specification,  120 
SSC,  122 

starvation  freedom,  11, 128 
state.  111,  173 
final,  161 
initial,  161, 168 
low  equivalence,  19 
LTS,  163 
program,  16,  87 
State,  17 
States,  18 
States,  18 

state  machines,  14,  57,  111,  115,  158, 

167 

state  update,  47 
static  analysis,  55,  56,  58 
statistical  database,  84,  99, 181 
stepwise  refinement,  see  refinement 
strategy,  24, 42, 45, 59 
structured  protection,  110 
stuttering,  115, 116, 120 
subbase,  134 


subject,  3, 113 

subset  closed,  122, 124, 172 

suppression,  84, 103 

attacker-controlled,  95,  97, 100 
program,  97, 100 
surprise,  21,  31 
Sys,  127 

system,  22,  50,  85, 112, 133, 182 
system  product,  127 
system  property,  12 

T 

T  (trusted),  85 
taint  analysis,  83 
tainted,  83 

taxonomy,  1,  2, 10, 181 
termination,  17,  22,  88, 112, 161 
insensitive,  116, 161 
sensitive,  162 

TIRNI,  161 

topology,  14, 133-139, 176 
Plotkin,  14, 134, 136 
Scott,  134 
Vietoris,  14, 136 
trace,  11,  111 

e  (empty),  112 
trace  property,  11, 110, 112 
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transaction,  1  untrusted,  83 

transmission,  84  untrusted  projection,  87 

true,  130  user,  1,  22 

trusted,  83 

V 

Trusted  Computer  System  Evaluation 


Criteria,  5, 110 
trusted  projection,  87 
TSRNI,  162 

u 

U  (untrusted),  85 
uncertainty,  36,  see  entropy 
anomaly,  7, 15 
flow  metric,  7,  42,  43 
vs.  accuracy,  34 
untainted,  83 


Val,  17 
validation,  83 
Var,  17 
variable,  85 
VarL,  19 

verification,  11, 110, 182 
TJl,  136 

w 

well-foundedness,  12 
while-programs,  18 
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